Biological Learning Objectives
Computational Learning Objectives
Document your answers in a single file stored in ~/qbb2024-answers/day1-lunch
.
Please git push
after each exercise and do not wait until the end of the session e.g.
git add explore-samples.Rmd
git commit -m "Add answer for exercise 1"
git push
# Confirm at github.com
Browse data dictionary
Metadata
GTEx_Analysis_v8_Annotations_SampleAttributesDD.xlsx
Prepare your working environment
explore-samples.R
) or a new R Notebook (e.g. explore-samples.Rmd
) using the provided templatetidyverse
packageWrangle the sample metadata
GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt
file and assign to variable df
Create a SUBJECT
column using the following code
df <- df %>%
mutate( SUBJECT=str_extract( SAMPID, "[^-]+-[^-]+" ), .before=1 )
df
is SUBJECT
Which two SUBJECT
s have the most samples? The least?
group_by()
, summarize()
, and arrange()
#
in R Script) or a bullet point (-
in R Notebook)Which two SMTSD
s (tissue types) have the most samples? The least? Why?
For subject GTEX-NPJ8
df_npj8
)Explore SMATSSCR
(autolysis score)
NA
values in this column to avoid mean()
returning NA
df %>%
filter( !is.na(SMATSSCR) )
SUBJECT
s have a mean SMATSSCR
score of 0?A. Identify another handful of columns using the data dictionary, explore the data, and describe your results