During the lab session we practiced analyzing 10x Genomics data starting with output files from the 10x Genomics Cell Ranger pipeline. While this demonstrates how you might analyze new data that you generate, there are many public datasets that you may wish to explore starting with the analysis already conducted by the original authors. In this assignment, you will obtain data from the Fly Cell Atlas which characterized 15 tissue types alongside whole head and body. Through a coordinated effort among >40 Drosophila labs around the world, more than 250 single-cell clusters were identified and annotated. After loading and exploring the quality of the Gut dataset, you will identify marker genes to gain insights into biological functions.
Refer to the Bioconductor OSCA book for more information on
Use help()
to display the built-in R Documentation for a given function (e.g. help(sort)
) after loading the appropriate package
There are three exercises in this assignment:
Before you do anything else, create an R script for this assignment. Everything you’ll need to do for this assignment will be done in this script and submitted via GitHub. Use comments to separate your code blocks into Exercise 1, Step 1.1; Exercise 1, Step 1.2; etc., and include written answers to questions as comments. You must show your work to receive full credit for each question by providing the code you used.
Load the following packages using library()
Load Gut data from flycellatlas.org
gut
by loading v2_fca_biohub_gut_10x_raw.h5ad
using readH5AD()
from zellkonverterX
to counts
using assayNames(gut) <- "counts"
gut <- logNormCounts(gut)
Question 1: Inspect the gut
SingleCellExperiment object (0.5 pt)
Question 2: Inspect the available cell metadata (0.5 pt)
colData(gut)
?colnames()
seem most interesting? Briefly explain why.X_umap
using plotReducedDim()
and colouring by broad_annotation
Sum the expression of each gene across all cells
genecounts
by using rowSums()
on the counts matrix returned by assay(gut)
Question 3: Explore the genecounts
distribution (1 pt)
summary()
? What might you conclude from these numbers?sort()
? What do they share in common?Question 4a: Explore the total expression in each cell across all genes (0.5 pt)
cellcounts
using colSums()
cellcounts
using hist()
Question 4b: Explore the number of genes detected in each cell (0.5 pt)
celldetected
using colSums()
but this time on assay(gut)>0
celldetected
using hist()
Sum the expression of all mitochondrial genes across each cell
mito
of mitochondrial gene names using grep()
to search rownames(gut)
for the pattern ^mt:
and setting value
to TRUEdf
using perCellQCMetrics()
specifying that subsets=list(Mito=mito)
df
to a data.frame using as.data.frame()
and then running summary()
colData(gut) <- cbind( colData(gut), df )
Question 5: Visualize percent of reads from mitochondria (1 pt)
plotColData()
to plot the subsets_Mito_percent
on the y-axis against the broad_annotation
on the x-axis rotating the x-axis labels using theme( axis.text.x=element_text( angle=90 ) )
and submit this plotQuestion 6a: Subset cells annotated as “epithelial cell” (1 pt)
coi
that indicates cells of interest where TRUE
and FALSE
are determined by colData(gut)$broad_annotation == "epithelial cell"
epi
by subsetting gut
with [,coi]
epi
according to X_umap
and colour by annotation
and submit this plotIdentify marker genes in the anterior midgut
marker.info
that contains the pairwise comparisons between all annotation categories using scoreMarkers( epi, colData(epi)$annotation )
mean.AUC
using the following codechosen <- marker.info[["enterocyte of anterior adult midgut epithelium"]]
ordered <- chosen[order(chosen$mean.AUC, decreasing=TRUE),]
head(ordered[,1:4])
Question 6b: Evaluate top marker genes (2 pt)
plotExpression()
and specifying the gene name as the feature and annotation
as the x-axis and submit this plotRepeat the analysis for somatic precursor cells
somatic precursor cell
intestinal stem cell
Question 7: Evaluate top marker genes (3 pt)
goi
that contains the names of the top six genes of interest by rownames(ordered)[1:6]
plotExpression()
and specifying the goi
vector as the features and submit this plotSubmit an R script with your code and answers to questions for
Be sure to include the following plots