Biological Learning Objectives
Computational Learning Objectives
Document your answers in ~/qbXX-answers/unix-python-scripts
Upload your scripts to https://github.com after each exercise and do not wait until the end of the session
Place all of your Unix commands in a single file named unix-commands.sh
along with the output as a comment e.g.
#!/bin/bash
tail ce11_genes.bed | head -n 2
# chrIII 13768540 13771741 NM_067444.8 515 -
# chrIII 13769876 13769953 NR_003432.1 9 -
Explore ce11_genes.bed using Unix
Calculate each of the following statistics by constructing a single command using one or more (linked together with a |
) of the following commands: cut
, grep
, sort
, uniq
, wc
chrI
, chrII
+
, -
Recalculate ce11_genes.bed scores using Python
Write a script that for each feature (line) recalculates the score (column 5) such that
Print out all six columns in BED format
Explore GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt using Unix
Transform GTEx data using Python
Write a script that extracts expression values for the first gene (DDX11L1) which is stored on a single line spread across more than 17,000 columns and transposes the data so that the expression in each sample is stored on a separate line.
Open the expression data file GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_tpm.gct
.readline()
header[i]
as the key to store data[i]
as the valueOpen the metadata file GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt
GTEX-ZZPU-2426-SM-5E44I 0 Artery - Tibial
GTEX-ZZPU-2626-SM-5E45Y 0.01965 Muscle - Skeletal
GTEX-ZZPU-2726-SM-5NQ8O 0.02522 Adipose - Subcutaneous
What are the first three tissues that have >0 expression?
Explore ~/Data/References/hg38/gencode.v46.basic.annotation.gtf using Unix
#
Export gene features to BED format using Python
Write a script that takes gencode.v46.basic.annotation.gtf and
#
gene
features (column 3)attribute
(column 9)