Learning Objectives
Document your answers in ~/qbXX-answers/weekX
git push after each exercise and do not wait until the end of the lab
Plot gene density across each chromosome
Create hg19-main.chrom.sizes with information about the 25 main chromosomes
wget to obtain hg19.chrom.sizes from https://hgdownload.soe.ucsc.edu/downloads.html under “Genome sequence files”less to see primary and secondary contigs grep -v _ hg19.chrom.sizes | sed 's/M/MT/' > hg19-main.chrom.sizes
Create hg19-1mb.bed with 1 mb intervals across the hg19 assembly
bedtools makewindows to view syntax and documentation
Usage: bedtools makewindows [OPTIONS] [-g <genome> OR -b <bed>]
[ -w <window_size> OR -n <number of windows> ]
bedtools makewindows and save the output using >Create hg19-kc.bed with one transcript per gene (aka knownCanonical)
mv to move the file from ~/Desktop to ~/qbXX-answers/weekXhg19-kc.tsv has 80,270 lines cut -f1-3,5 hg19-kc.tsv > hg19-kc.bed
hg19-kc.bed looks like
#chrom chromStart chromEnd transcript
chr1 169818771 169863037 ENST00000367771.11_11
chr1 169764180 169823221 ENST00000359326.9_7
chr1 27938574 27961696 ENST00000374005.8_7
Count how many genes are in each 1 mb interval using bedtools intersect
-x which is just a placeholder) e.g.
bedtools intersect -x -a fileA.bed -b fileB.bed
hg19-kc-count.bed-a and -b and think about why order mattersUse R to plot gene density across all 25 main chromosomes
tidyversehg19-kc-count.bed using read_tsv(), specifying the header e.g.
header <- c( "chr", "start", "end", "count" )
df_kc <- read_tsv( ________, col_names=header )
ggplot(), geom_line(), and facet_wrap(), using the scales argument to let the x- and y-axes adjust for each chromosome
filter( chr == "chr1" ) so that you don’t have to use facet_wrap(), then proceed to plotting all the chromosomesggsave( "exercise1.png" )Submit the following
hg19-kc files as they are largeCompare hg19 gene annotations with hg16 [fn. 1]
Prepare hg16 files as in Exercise 1 with the following modifications
hg16.chrom.sizes grep -v _ hg16.chrom.sizes > hg16-main.chrom.sizes
Visualize both hg19 and hg16 gene distributions on the same line plots
bind_rows( hg19=dfA, hg16=dfB, .id="assembly" )aes( color=___ )Calculate how many genes are unique to each assembly
intersect with a 1-letter option to find genes with no overlapsSubmit the following
Explore chromatin states between conditions
Visualize ChromHMM chromatin state segmentation
Create four .bed files corresponding to 1_Active and 12_Repressed for NHEK and NHLF
grep 1_Active nhek.bed > nhek-active.bed
Construct a bedtools command to test where there is any overlap between 1_Active and 12_Repressed in a given condition (aka mutually exclusive)
Construct two bedtools intersect [OPTIONS] -a nhek-active.bed -b nhlf-active.bed commands, one to find regions that are active in NHEK and NHLF, and one to find regions that are active in NHEK but not active in NHLF
Construct three bedtools intersect commands to see the effect of using the arguments -f 1, -F 1, and -f 1 -F 1 when comparing -a nhek-active.bed -b nhlf-active.bed
chr1 25558413 25559413Construct three bedtools intersect commands to identify the following types of regions. Use UCSC Genome Browser to save one PDF image for each of the three types of regions. Describe the chromatin state across all nine conditions.
Submit the following
bedtools commands; place output and answers as commentsIdentify where variation occurs
Obtain snps-chr1.bed with only the chromosome 1 Common SNPs
chr1, Output format BED, Output filename snps-chr1.bedUse bedtools intersect and hg19-kc.bed to determine which gene has the most SNPs
Determine which SNPs lie within vs outside of a gene
bedtools sample -n 20 -seed 42bedtools sort to sort the subset of SNPsbedtools sort to sort hg19-kc.bedbedtools closest -d on the two sorted files, with -t first to break tiesSubmit the following
bedtools commands; place output and answers as comments