Biological Learning Objectives
Computational Learning Objectives
Document your answers in ~/qbXX-answers/miniproject-assembly-metrics
git push after each exercise and do not wait until the end of the session e.g.
git add README.md
git commit -m "Start documentation for this assignment"
git push
# Confirm at github.com
Start documenting your work with a README.md file
cd ~/qb25-answersminiproject-assembly-metrics using mkdircdtouch# My Project) and a short description using your own wordsList available genome assemblies for C. remanei
Genomic Sequence (FASTA)- https://ftp.ebi.ac.uk/genome.fa.gz)Create a Bash script to download the genome assemblies
wget URLchmod +x./getGenomes.shls -lhgunzip *.gzCreate a Python script that reads in a single .fasta file and reports basic metrics
Fetch fasta module which defines FASTAReader class
wgetWrite a Python script that expects a .fasta file as an argument (e.g. ./assembly-metrics.py sequences.fa)
assembly-metrics.py#!) line to declare this a python3 scriptimport fastaopen() to open the file specified as the “1-th” command line argument e.g. [1]fasta.FASTAReader( ___ )for ident, sequence in ____
len() and sum up the total lengthAnalyze the four .fa files using your script
Extend your Python script to calculate the N50 statistic
___.append( ___ )___.sort( reverse=True )getGenomes.shassembly-metrics.py