Biological Learning Objectives
Computational Learning Objectives
Document your answers in ~/qbXX-answers/miniproject-codon-usage
git push
after each exercise and do not wait until the end of the session
Prepare your new project directory
miniproject-codon-usage
wget
to download fasta.py, codons.py, cytoplasm.fa, and membrane.faParse CDS sequence into three nucleotide codons
Write a Python script that expects a .fasta file as an argument (e.g. ./codon-usage.py sequences.fa
)
FASTAReader
based on the 1
-th command line argumentident, sequence
in the .fasta filesequence
you
ident
codon
codon
for
loop and position counter or a while
loop where you repeatedly remove a codon from the sequence until it is emptyTest your Python scipt using a small subset of sequences
head
on cytoplasm.fa
and saving the output to subset.fa
./codon-usage.py subset.fa
)Once you’ve completed this exercise
ident
and codon
subset.fa
, and README.md to GitHubExtend your code to translate codons into amino acids and count abundance
Add two dictionaries, one to translate codon to amino acid, one to count amino acid abundance
codons
module which defines a codons.forward
dictionary
print( codons.forward )
aas
to count amino acid abundanceUpdate your loop to count amino acid abundance
codons
module to translate each codon (key) into amino acid (value)aas
dictionary to track the occurance (value) of each amino acid (key)if key in dictionary
) and if not initialize (e.g. value = 1
)Test your Python scipt using different input sequences
print( aas )
subset.fa
, commenting on why you think the output is correctcytoplasm.fa
and membrane.fa
, commenting on why you think this difference occursImprove your code to more easily compare two sets of sequences
./codon-usage.py seqs1.fa seqs2.fa
)aas1
and aas2
)
f"{aa}\t{count1}\t{count2}"
) with rows sorted alphabetically by amino acid.
codons.reverse.keys()
, converting to a list with list()
, and sorting with sorted()
cytoplasm.fa
and membrane.fa
to a file e.g. > cyto-v-mem.tsv
Examine codon bias
codon-bias.py
that expects two arugments, a single .fasta file and an amino acid (single letter, uppercase)FASTAReader
counts
codons.reverse
codons.reverse
to retrieve the codons (value) for a given amino acid (key)
W
is length 1, value for key L
is length 6cytoplasm.fa
examining codons for L
to a file named bias-L.tsv
A
to bias-A.tsv