Assignment 8: The Wright-Fisher model

Assignment Date: Friday, Oct. 29, 2021
Due Date: Friday, Nov. 12, 2021 @ 1pm ET

Lecture

Lecture Slides

The Wright-Fisher model

Recall, the Wright-Fisher model of DNA sequence evolution:

  • Two alleles
  • Fixed population of size (N) (thus (2N) genomes)
  • Non-overlapping generations
  • Random mating

The pool of alleles at each generation can be described by a stochastic process with a binomial transition probability. If (i) is the number of A alleles at generation (n - 1), let (p_i = i / 2N), then the distribution of A alleles at generation (n) assuming no selection is:

\Large binomial

We can introduce selection by changing the probability of sampling different alleles. For example:

\Large p_A = \frac{ i ( 1 + s ) }{ 2N - i + i ( 1 + s )}

Assignment

  1. Implement a Wright-Fisher simulation of allele frequencies for an arbitrary starting allele frequency and population size. The simulation should run until one of the alleless becomes fixed. Implement this simulation as a python function that takes two input arguments: the starting allele frequency, and the population size. As an output, the function should return a list that contains the allele frequency at each generation.

    HINT: You definitely want to do this in Python. Remember that there are a variety of distributions in numpy.random, scipy.stats and so on that can help you out.

  2. Write a function that plots the allele frequency versus generation for the entirety of your simulation. Produce such a plot for one of your simulations. Make sure you label your axes.

  3. For a starting allele frequency of 0.5, and a population size of 100, produce a histogram with density showing time to fixation over (at least) 1000 trials.

  4. For a starting allele frequency of 0.5, vary the population size and produce a plot that shows fixation time vs (N). A reasonable range of population sizes might be 100 to 10 million.

  5. Simulate the time to fixation under a range of different starting allele frequencies. Produce a plot showing starting allele frequency vs. number of generations to fix. Do (at least) 100 simulations for each and include the variability in your plot.

  6. Introduce selection to your function from Part 1 (as an additional parameter that can be specified) and plot the allele frequency trajectory for some chosen parameters. On your plot, make sure you note what your selection coefficient for the simulation was. Additionally, plot selection coefficient vs time to fixation for a fixed population size of your choice. On your plot, make sure you note what your population size was.

Submitting your assignment

If you wrote a Python script, submit your script and plots. If you wrote a Jupyter notebook, submit that.