Learning Objectives
~/qbXX-answers/weekXREADME.mdgit push after each exerciseStrive to organize your directory e.g.
/Users/cmdb/qb25-answers/week2
├── README.md
├── genomes
│ └── sacCer3.fa
├── rawdata
│ └── ERR8562476.fastq
└── variants
├── A01_01.bam
├── ...
└── A01_06.sam
Download BYxRM dataset and unpack using tar
cd ~/Data
wget -O BYxRM.tar.gz "https://www.dropbox.com/scl/fi/ggmmx6ceqni3e2wnljn0t/BYxRM.tar.gz?rlkey=rfkthvs49pa56y9eo3ixyq9ep&dl=0"
tar xf BYxRM.tar.gz
Confirm that ls -l BYxRM matches
-rw-r--r-- 1 cmdb staff 23729508 May 6 2024 BYxRM_GenoData.txt
-rw-r--r-- 1 cmdb staff 797659 May 6 2024 BYxRM_PhenoData.txt
drwxr-xr-x 98 cmdb staff 3136 Jul 30 15:13 fastq
Skim genotype calls in BYxRM_GenoData.txt using less
marker A01_01 A01_02
27915_chr01_27915_T_C R B
28323_chr01_28323_G_A R B
28652_chr01_28652_G_T R B
29667_chr01_29667_C_A R B
Align short sequencing reads using Bowtie2
Build reference genome index (only one time per reference)
cd genomes
cp ~/Data/References/sacCer3/sacCer3.fa.gz .
gunzip sacCer3.fa.gz
bowtie2-build sacCer3.fa sacCer3
Map reads to genome and sort/index with samtools (as many times as samples)
cd variants
bowtie2 -p 4 -x ___/sacCer3 -U ___/A01_01.fq.gz > A01_01.___
samtools sort -o ___ ___
samtools index ___
samtools idxstats ___ > A01_01.idxstats
Visualize reads using IGV.app
Submit the following
bowtie2 and three samtools commandssamtools idxstatsBuild a workflow to process the first six samples using a Bash script
Create map-reads.sh by completing this template
#!/bin/bash
for sample in # list prefixes e.g. A01_01 A01_02
do
echo "***" $sample
# mapping command e.g. bowtie2 path/to/$sample.fq.gz
# sort command
# index command
done
Visualize haplotype using IGV.app
Submit the following
Summarize sequence alignments using Python
Create summarize-sam.py to count alignments to each chromosome
@RNAME fieldsamtools idxstats
samtools sort -o reads.sorted.sam reads.samExtend summarize-sam.py to examine mismatches per alignment
NM:i:count SAM tag occursNM is not always in the same column so use a for loop to go through the fields after splitting a lineNM:i: by slicing and convert to int()sorted() function e.g.Submit the following
Align long sequencing reads using minimap2
Find yeast Nanopore dataset at SRA
Fetch reads using sratoolkit
week2/rawdata directoryfasterq-dump -p ERR8562476
conda install sra-toolsERR8562476.fastq has 137436 linesMap reads to genome
week2/longreadsminimap2 with the following arguments
conda install minimap2longreads.samVisualize using IGV.app
Submit the following
minimap2 commandsamtools idxstatsAlign RNA-seq reads using HISAT2
hisat2-build sacCer3.fa sacCer3week2/rnahisat2 (command similar to bowtie2)Submit the following
hisat2 command and your description