Motivation (9:00-9:05; PowerPoint)

  • State learning objectives

Conditionals and comparisons (live-coding; 9:05-9:20)

Software Carpentry: Analyzing Patient Data (Numpy; live-coding; 9:20-10:10)

  • https://swcarpentry.github.io/python-novice-inflammation/instructor/02-numpy.html

  • introduce concept of libraries, which allow access to functions that are not built in to python (9:20-9:25)
  • use numpy.loadtxt() to load the inflammation dataset, then view it, store it in a variable and print it (9:25-9:30)
  • discuss the concept of parameters to a function and order-based versus explicit specification (9:30-9:35)
  • examine and discuss the type() (an n-dimensional array) (9:20-9:22)
  • examine and discuss the .shape and use this as an example of an “attribute” of a python object (9:35-9:38)
  • view all the attributes using the built-in dir() function (9:38-9:40)
  • index various elements from the numpy array data[0,0] and discuss that these start from the top left (9:40-9:43)

  • discuss and demonstrate data slices - start at the first index and go up to but not including the second index (9:43-9:48)
  • show what happens if you dont include bone of the bounds on the slice data[:3, 36:]

  • discuss and demonstrate built-in functions such as numpy.mean(data) (9:48-9:52)
  • demonstrate some other numpy functions and the concept of multiple assignment
maxval, minval, stdval = numpy.amax(data), numpy.amin(data), numpy.std(data)
  • what if we want to compute a statistic on only a subset of the data (e.g., a specific patient?) (9:52-9:58)
patient_0 = data[0, :] # 0 on the first axis (rows), everything on the second (columns)
print('maximum inflammation for patient 0:', numpy.amax(patient_0))
  • this can be done in one line instead
print('maximum inflammation for patient 2:', numpy.amax(data[2, :]))
  • numpy allows you to use the axis parameter to apply a function to each row or column (9:58-10:00)
print(numpy.mean(data, axis=0))
print(numpy.mean(data, axis=1))

Software Carpentry: Visualizing Patient Data (Matplotlib; 10:10-11:00)

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(xdata, ydata)
plt.show()

Exercise (11:00-12:00)

Intro (live-coding; 11:00-11:15)

  • Describe goal: recreate Fig 3B from Lott et al 2011 PLoS Biology https://pubmed.gov/21346796
  • Discuss plot-sxl.py (i.e., “Who can tell me what this line is doing?”)
  • Have students independently run code and confirm initial plot created
#!/usr/bin/env python3

import numpy as np
import matplotlib.pyplot as plt

# Get dataset to recreate Fig 3B from Lott et al 2011 PLoS Biology https://pubmed.gov/21346796
# wget https://github.com/bxlab/cmdb-quantbio/raw/main/assignments/lab/bulk_RNA-seq/extra_data/all_annotated.csv

transcripts = np.loadtxt( "all_annotated.csv", delimiter=",", usecols=0, dtype="<U30", skiprows=1 )
print( "transcripts: ", transcripts[0:5] )

samples = np.loadtxt( "all_annotated.csv", delimiter=",", max_rows=1, dtype="<U30" )[2:]
print( "samples: ", samples[0:5] )

data = np.loadtxt( "all_annotated.csv", delimiter=",", dtype=np.float32, skiprows=1, usecols=range(2, len(samples) + 2) )
print( "data: ", data[0:5, 0:5] )

# Find row with transcript of interest
for i in range(len(transcripts)):
    if transcripts[i] == 'FBtr0331261':
        row = i

# Find columns with samples of interest
cols = []
for i in range(len(samples)):
    if "female" in samples[i]:
        cols.append(i)

# Subset data of interest
expression = data[row, cols]

# Prepare data
x = samples[cols]
y = expression

# Plot data
fig, ax = plt.subplots()
ax.set_title( "FBtr0331261" )
ax.plot( x, y )
fig.savefig( "FBtr0331261.png" )
plt.close( fig )

Extensions for student exercise (11:15-12:00)

  • Annotate Plot
  • Rotate x-axis labels
  • Rename title to: “Sxl (FBtr0331261)”
  • Add x- and y-axis labaels
  • Create plot-msl-2.py for a different gene (FBtr0077571)

Additional exercises (optional / advanced)

  • Obtain male data (HINT: .startswith( “male” ))
  • Plot female as red, male as blue
  • Add legend