Learning Objectives
~/qbXX-answers/weekX
README.md
git push
after each exerciseStrive to organize your directory e.g.
/Users/cmdb/qb25-answers/week2
├── README.md
├── genomes
│ └── sacCer3.fa
├── rawdata
│ └── ERR8562476.fastq
└── variants
├── A01_01.bam
├── ...
└── A01_06.sam
Download BYxRM dataset and unpack using tar
cd ~/Data
wget -O BYxRM.tar.gz "https://www.dropbox.com/scl/fi/ggmmx6ceqni3e2wnljn0t/BYxRM.tar.gz?rlkey=rfkthvs49pa56y9eo3ixyq9ep&dl=0"
tar xf BYxRM.tar.gz
Confirm that ls -l BYxRM
matches
-rw-r--r-- 1 cmdb staff 23729508 May 6 2024 BYxRM_GenoData.txt
-rw-r--r-- 1 cmdb staff 797659 May 6 2024 BYxRM_PhenoData.txt
drwxr-xr-x 98 cmdb staff 3136 Jul 30 15:13 fastq
Skim genotype calls in BYxRM_GenoData.txt using less
marker A01_01 A01_02
27915_chr01_27915_T_C R B
28323_chr01_28323_G_A R B
28652_chr01_28652_G_T R B
29667_chr01_29667_C_A R B
Align short sequencing reads using Bowtie2
Build reference genome index (only one time per reference)
cd genomes
cp ~/Data/References/sacCer3/sacCer3.fa.gz .
gunzip sacCer3.fa.gz
bowtie2-build sacCer3.fa sacCer3
Map reads to genome and sort/index with samtools (as many times as samples)
cd variants
bowtie2 -p 4 -x ___/sacCer3 -U ___/A01_01.fq.gz > A01_01.___
samtools sort -o ___ ___
samtools index ___
samtools idxstats ___ > A01_01.idxstats
Visualize reads using IGV.app
Submit the following
bowtie2
and three samtools
commandssamtools idxstats
Build a workflow to process the first six samples using a Bash script
Create map-reads.sh
by completing this template
#!/bin/bash
for sample in # list prefixes e.g. A01_01 A01_02
do
echo "***" $sample
# mapping command e.g. bowtie2 path/to/$sample.fq.gz
# sort command
# index command
done
Visualize haplotype using IGV.app
Submit the following
Summarize sequence alignments using Python
Create summarize-sam.py
to count alignments to each chromosome
@
RNAME
fieldsamtools idxstats
samtools sort -o reads.sorted.sam reads.sam
Extend summarize-sam.py to examine mismatches per alignment
NM:i:count
SAM tag occursNM
is not always in the same column so use a for loop to go through the fields after splitting a lineNM:i:
by slicing and convert to int()
sorted()
function e.g.Submit the following
Align long sequencing reads using minimap2
Find yeast Nanopore dataset at SRA
Fetch reads using sratoolkit
week2/rawdata
directoryfasterq-dump -p ERR8562476
conda install sra-tools
ERR8562476.fastq
has 137436 linesMap reads to genome
week2/longreads
minimap2
with the following arguments
conda install minimap2
longreads.sam
Visualize using IGV.app
Submit the following
minimap2
commandsamtools idxstats
Align RNA-seq reads using HISAT2
hisat2-build sacCer3.fa sacCer3
week2/rna
hisat2
(command similar to bowtie2
)Submit the following
hisat2
command and your description