Learning Objectives
Document your answers in ~/qbXX-answers/weekX
git push
after each exercise and do not wait until the end of the lab
Plot gene density across each chromosome
Create hg19-main.chrom.sizes with information about the 25 main chromosomes
wget
to obtain hg19.chrom.sizes
from https://hgdownload.soe.ucsc.edu/downloads.html under “Genome sequence files”less
to see primary and secondary contigs grep -v _ hg19.chrom.sizes | sed 's/M/MT/' > hg19-main.chrom.sizes
Create hg19-1mb.bed with 1 mb intervals across the hg19 assembly
bedtools makewindows
to view syntax and documentation
Usage: bedtools makewindows [OPTIONS] [-g <genome> OR -b <bed>]
[ -w <window_size> OR -n <number of windows> ]
bedtools makewindows
and save the output using >
Create hg19-kc.bed with one transcript per gene (aka knownCanonical)
mv
to move the file from ~/Desktop
to ~/qbXX-answers/weekX
hg19-kc.tsv
has 80,270 lines cut -f1-3,5 hg19-kc.tsv > hg19-kc.bed
hg19-kc.bed
looks like
#chrom chromStart chromEnd transcript
chr1 169818771 169863037 ENST00000367771.11_11
chr1 169764180 169823221 ENST00000359326.9_7
chr1 27938574 27961696 ENST00000374005.8_7
Count how many genes are in each 1 mb interval using bedtools intersect
-x
which is just a placeholder) e.g.
bedtools intersect -x -a fileA.bed -b fileB.bed
hg19-kc-count.bed
-a
and -b
and think about why order mattersUse R to plot gene density across all 25 main chromosomes
tidyverse
hg19-kc-count.bed
using read_tsv()
, specifying the header e.g.
header <- c( "chr", "start", "end", "count" )
df_kc <- read_tsv( ________, col_names=header )
ggplot()
, geom_line()
, and facet_wrap()
, using the scales
argument to let the x- and y-axes adjust for each chromosome
filter( chr == "chr1" )
so that you don’t have to use facet_wrap()
, then proceed to plotting all the chromosomesggsave( "exercise1.png" )
Submit the following
hg19-kc
files as they are largeCompare hg19 gene annotations with hg16 [fn. 1]
Prepare hg16 files as in Exercise 1 with the following modifications
hg16.chrom.sizes
grep -v _ hg16.chrom.sizes > hg16-main.chrom.sizes
Visualize both hg19 and hg16 gene distributions on the same line plots
bind_rows( hg19=dfA, hg16=dfB, .id="assembly" )
aes( color=___ )
Calculate how many genes are unique to each assembly
intersect
with a 1-letter option to find genes with no overlapsSubmit the following
Explore chromatin states between conditions
Visualize ChromHMM chromatin state segmentation
Create four .bed files corresponding to 1_Active and 12_Repressed for NHEK and NHLF
grep 1_Active nhek.bed > nhek-active.bed
Construct a bedtools
command to test where there is any overlap between 1_Active and 12_Repressed in a given condition (aka mutually exclusive)
Construct two bedtools intersect [OPTIONS] -a nhek-active.bed -b nhlf-active.bed
commands, one to find regions that are active in NHEK and NHLF, and one to find regions that are active in NHEK but not active in NHLF
Construct three bedtools intersect
commands to see the effect of using the arguments -f 1
, -F 1
, and -f 1 -F 1
when comparing -a nhek-active.bed -b nhlf-active.bed
chr1 25558413 25559413
Construct three bedtools intersect
commands to identify the following types of regions. Use UCSC Genome Browser to save one PDF image for each of the three types of regions. Describe the chromatin state across all nine conditions.
Submit the following
bedtools
commands; place output and answers as commentsIdentify where variation occurs
Obtain snps-chr1.bed with only the chromosome 1 Common SNPs
chr1
, Output format BED, Output filename snps-chr1.bedUse bedtools intersect
and hg19-kc.bed
to determine which gene has the most SNPs
Determine which SNPs lie within vs outside of a gene
bedtools sample -n 20 -seed 42
bedtools sort
to sort the subset of SNPsbedtools sort
to sort hg19-kc.bedbedtools closest -d
on the two sorted files, with -t first
to break tiesSubmit the following
bedtools
commands; place output and answers as comments