Biological Learning Objectives
Computational Learning Objectives
Document your answers in ~/qbXX-answers/miniproject-assembly-metrics
git push
after each exercise and do not wait until the end of the session e.g.
git add README.md
git commit -m "Start documentation for this assignment"
git push
# Confirm at github.com
Start documenting your work with a README.md file
cd ~/qb25-answers
miniproject-assembly-metrics
using mkdir
cd
touch
# My Project
) and a short description using your own wordsList available genome assemblies for C. remanei
Genomic Sequence (FASTA)
- https://ftp.ebi.ac.uk/genome.fa.gz
)Create a Bash script to download the genome assemblies
wget URL
chmod +x
./getGenomes.sh
ls -lh
gunzip *.gz
Create a Python script that reads in a single .fasta file and reports basic metrics
Fetch fasta
module which defines FASTAReader
class
wget
Write a Python script that expects a .fasta file as an argument (e.g. ./assembly-metrics.py sequences.fa
)
assembly-metrics.py
#!
) line to declare this a python3 scriptimport fasta
open()
to open the file specified as the “1-th” command line argument e.g. [1]
fasta.FASTAReader( ___ )
for ident, sequence in ____
len()
and sum up the total lengthAnalyze the four .fa files using your script
Extend your Python script to calculate the N50 statistic
___.append( ___ )
___.sort( reverse=True )
getGenomes.sh
assembly-metrics.py