AS.020.607     CMDB Quantitative Biology Bootcamp     Fall 2021

Table of Contents

Instructors
Teaching Assistants
Class Meetings
Course Website
Course Description
Specific Objectives
Study Materials
What To Bring
Course Format
    Attendance
    Types of Sessions
        Interactive Lecture
        Assignments
        Presentations
        Review
    Daily Schedule
    Daily Reflections
    Food
Asking for Help
    Do It
    Generally How
    Googling
Grading
Academic Integrity & Ethics
Code of Conduct
The Pandemic
Technology
    Platforms
    Laptops
Full Bootcamp Schedule

AS.020.607     CMDB Quantitative Biology Bootcamp     Fall 2021

Instructor Email
Rajiv McCoy rajiv.mccoy@jhu.edu
Michael Sauria msauria1@jhu.edu
Frederick Tan tan@carnegiescience.edu

Teaching Assistant Email
Kate Weaver kweave23@jhu.edu
Steph Yan syan@jhu.edu
Dylan Taylor dtaylo95@jhu.edu
Andrew Bortvin abortvi2@jhu.edu

Class Meetings
Mon, Aug 30, 2021 - Fri, Sept 3, 2021
8:45 A.M. ET - 7 P.M. ET, In Person, UTL398, or through Zoom
Public Zoom link. See slack for passcode.

Course Website
http://bxlab.github.io/cmdb-bootcamp/

Course Description

Quantitative and computational methods are increasingly essential to all sub-disciplines of modern biological research. The goal of this intensive week-long “boot camp” is to empower students with the fundamental skills to apply these methods, as well as connect them to resources for further developing their knowledge and abilities. The class starts at 9 am with formal instruction ending at 4 pm daily. The course demonstrates the importance of version control, documentation, testing, and other methods for enhancing reproducibility, reliability, and usability of software. This is achieved through live coding sessions and use of learning exercises, where for the majority of the class, students perform data analysis to address biological questions and reinforce core bioinformatic concepts. Upon completing the course, students should be comfortable using and writing software to work with large-scale biological data. The motivation of this goal is to develop computational and statistical competence in preparation for courses, rotations, thesis research, and careers. Rather than blindly outsourcing bioinformatic components of their work, students will be empowered to understand methodological details and their associated advantages and limitations. This will in turn advance the broader goal of rigor in experimental design, promoting robust and unbiased results.

Specific Objectives

  • Develop comfort working within a UNIX environment and at the command line to build and run programs
  • Develop confidence using Python and its extended ecosystem of tools for bioinformatic data analysis and visualization
  • Develop knowledge of standard bioinformatic file formats and where they fit within the context of a project
  • Develop appreciation for and practice of concise and transparent presentation of data
  • Develop fluency in basic statistical tests, associated assumptions, and interpretation of their output
  • Develop good habits for ensuring reproducible research

Study Materials

This course does not have a required text. Any lecture notes, slides, and interactive coding scripts/notebooks will be made available on the course website.

What to Bring

  • Headphones with inline mic
  • Face mask

Course Format

Attendance

Due to the interactive nature of this course, there is a policy that students should not participate in any other meetings, courses, or lab work throughout the week. As a core CMDB course, bootcamp should take priority. If a student foresees attendance conflicts, they should contact a TA immediately so that this policy can be communicated to the necessary parties.

This course will be held in a hybrid format. Students will be able to attend and participate in all activities either in person or through Zoom.

Please be on time for course activities. Your attendance is expected at all times. Please let us know about any emergencies, family responsibilities, illness, etc. that may prevent attendance, and we will work to accommodate reasonable requests.

Types of Sessions

The course is broken into four main types of sessions

  1. Interactive Lecture & Live-Coding Sessions
    • These sessions will consist of interactive lectures and instructor-guided live-coding examples, which introduce and expand on material for the exercise sessions that occur directly afterward. Instructors will give lectures synchronously through Zoom and TAs will be available in person and online to provide support.
    • The instructor will pause periodically to invite questions, let students catch up, and solicit sticky-note check-ins. See the next syllabus section, Asking for help, on more specific information pertaining to questions and debugging in these sessions.
    • All scripts or notebooks used by instructors from these sessions will be made available on the course website through GitHub.
  2. Exercise Sessions (Lunch and Homework)
    • These sessions will be hosted by TAs.
    • Students are encouraged to work in groups with their table partners. While groups are expected to have very similar answers, students should put in individualized effort and strive to understand every line of code and analysis step they take. See the Academic Integrity & Ethics syllabus section for more information.
    • Course-wide announcements and updates during these sessions will be posted to Slack.
    • See the next syllabus section, Asking for help, on more specific information pertaining to questions and debugging in these sessions.
  3. Student Presentation Sessions
    • Pairs of students will informally present their code and results from the most recent exercise. Presentations will be given through Zoom.
    • Instructors will provide feedback, and invite questions and discussion among the whole class on alternative approaches and solutions.
    • Presentations are ungraded
    • There are typically two of these sessions a day. One in the morning to present homework exercises, and one in the afternoon to present lunch exercises.
  4. Review Sessions
    • These sessions will be hosted by TAs or instructors.
    • For each review session, we will have 3 breakout groups where students self-assign themselves into groups where the groups are defined by student comfort level : (1) Students who feel confident in their skills and want to learn new material, (2) Students who feel comfortable with the basic skills , but would like to thoroughly review a previous assignment, building a solution through interactive live (pseudo & formal) coding, (3) Students who do not feel comfortable with the basics of Unix or Python.
    • These will take place on Wednesday and Friday afternoon in place of the afternoon interactive sessions.

Daily Schedule

Most sessions will be separated by breaks, however, sometimes a break will be given in the middle of an interactive lecture. We will let you know this in advance. However, the daily schedule will generally be:

Morning   Afternoon  
9:00 Presentations 2:00 Presentations
9:30 Interactive (with break) 2:30 Interactive (with break)
11:00 Lunch
11:45 Exercises 4:00 Exercises
1:45 Break 7:00 Wrap Up

On Wednesday and Friday, the afternoon Interactive sessions will be replaced with the Review sessions described above.

Daily Reflections

We ask students to write daily reflections and turn them in through Google Forms by 7pm ET. Reflections should focus on the current day. The purpose of reflections is threefold.

  1. First, these reflections allow for students to offer constructive critiques of the course.
  2. Second, we believe this exercise has the potential to grow a student’s understanding and confidence as they purposefully reflect and see their improvement.
  3. Third, this practice directly cultivates good habits in reproducible analysis and keeping a lab notebook.

We expect these reflections to be short, but respond to each of the following prompts:

  • What purpose did today’s lessons, my analysis, and my coding serve?
  • How might I use what I did and learned today in the future?
  • What was my greatest struggle?
  • What was my greatest achievement?
  • What one lingering question, bemusing, or idea (on theory, syntax, etc.) do I find most perplexing or exciting?

If you have specific material you know you would like covered in a review session, please also respond to the non-required prompt:

  • What would you like reviewed potentially in the next review session?

Notes on reflections:

  • As long as reflections attempt to respond to the above prompts, the content of the reflection will not affect a student’s grade in any way. See the Grading section for further clarification.
  • We aim to keep reflections confidential. As such, only instructors, TAs, and the student will have access to a student’s reflections.

Food

  • Lunch and an afternoon snack will be provided every day.
  • Students must eat and drink socially distanced, primarily outside, (or in the Mudd Atrium in the case of bad weather).
  • Food cannot be brought into or stored in the building.
  • Water is allowed in class.

Asking for Help

We both expect and encourage you to ask questions and request help throughout this course. Everyone is learning and there is no shame in having questions, not understanding what an instructor has said or typed, or needing to debug your code. Therefore, please ask questions about whatever and whenever you need to. We’ve written up guidelines below on how you can ask for help during each type of session.

  • For most live-coding sessions, the Lead Instructor will solicit Check Ins, asking students to respond to a poll about whether they are ready to move on to the next section using sticky notes.
    • Use a green sticky note if you’re following along and are ready to move on.
    • Use a yellow sticky note if you’re working and need more time, but are not ready to move on and do not need assistance yet.
    • Use a pink sticky note if you need assistance.
    • Students attending live-coding sessions remotely will be able to use sticky note reactions through Slack in place of physical notes.
  • If the Lead Instructor prompts the class to ask or answer questions verbally, please use Zoom’s raise hand feature to be courteous.
  • If you experience problems you cannot fix in a live-coding session, outside of solicited check ins, still display a pink sticky note, and a TA will assist you. However, depending on the problem, and what will be required to fix it, TAs may suggest that you continue to pay attention and follow along as best as you can, aiding fixing at a later stopping point.
  • You can also slack questions to TAs, instructors, or in channels.
  • During Presentations, please wait to ask questions until the presenting student(s) have finished. You may slack any TA during the presentation, but it is likely in your best interest to wait until questions or comments are solicited by the Lead Instructor.

Googling

Googling is always an acceptable way to find answers or help, and we encourage you to utilize it extensively. If you adopt a solution following a Google search, make sure you understand what you incorporate, rather than just copy/pasting without comprehension of the logic or code. Please see the Academic Integrity & Ethics syllabus section for more on this.

Grading

The grading for this course is based on reasonable completion. For each lunch and homework exercise session, students will be advised which exercises are required and which are advanced or optional. Additionally, students will be asked to upload certain Jupyter notebooks, scripts, and results (outputs or plots, etc.) to their personal Github repositories, qbb2021-answers. For lunch exercises, these materials should be pushed to the repository by the end of the lunch exercise session. For homework exercises, these materials should be pushed to the repository by the beginning of the next morning session.

TAs will verify that the student’s submitted work shows a reasonable level of individual effort. Finally, students will have one week post-bootcamp to submit any leftover materials. TAs and instructors will push individualized feedback to the student repositories and letter grades will be assigned in line with the level of completion. While student presentations and the content of daily reflections will not affect grades in any way, failure to turn in daily reflections has the potential to lower the final grade.

General guidelines for letter grade assignments:

  • A (excellent): 90-100% completion
  • B (passing): 80-90% completion
  • C or lower (not passing): <80% completion

Academic Integrity & Ethics

Academic and scientific institutions and research depend on honesty and integrity. You should be completing your lunch and homework exercises and your work should not plagiarize others – including group partners, presenters, and strangers posting to online forums or blogs. You and your partner should be working together, but both persons should be writing and turning in unique, individualized code. Note your scripts and analysis may follow the same logic steps and even have tidbits of the same code, but no one person should be writing the solution the whole group uses character for character. Additionally, you should understand every line of code you write, are given and use, or find online and incorporate. If asked to, you should be able to explain exactly what your code is doing. Another aspect is properly acknowledging the source of borrowed code. Understanding can be cultivated and acknowledgement implemented by writing both inline and multiline comments (which is a terrific practice in general). Relatedly, don’t give someone code to copy and paste. Make sure any recipient can explain back to you any gifted code. See the university’s guidelines on plagiarism for details and contact a TA with any questions.

Code of Conduct

We are committed to creating a welcoming, inclusive, and harassment-free environment for everyone, regardless of gender, gender identity and expression, age, sexual orientation, disability, physical appearance, body size, race, ethnicity, religion (or lack thereof), political beliefs/leanings, or technology choices. We do not tolerate harassment in any form. This code of conduct applies to all course participants, including instructors and TAs, and to all modes of interaction and communication.

All class participants agree to:

  • Be considerate and polite. Bootcamp is a stressful and sometimes frustrating week for everyone. In addition, the collaborative nature of this course means it is essential that students are respectful of each other’s questions and points of confusion. Remember that you all are coming into this program with a wide range of coding backgrounds. A community where people feel uncomfortable is not conducive for learning.
  • Work together cooperatively. Be kind when giving feedback on your classmates’ code, and make an effort to help if they are stuck. You will gain much more from explaining concepts to someone else than you will from rushing through the assignments on your own.
  • Refrain from demeaning, discriminatory, or harassing behavior and speech. Harassment includes, but is not limited to: deliberate intimidation; stalking; unwanted photography or recording; sustained or willful disruption of talks or other events; inappropriate physical contact; use of sexual or discriminatory imagery, comments, or jokes; and unwelcome sexual attention. If you feel that someone has harassed you or otherwise treated you inappropriately, please alert one of the instructors or TAs.
  • Refrain from advocating for, or encouraging, negative behavior. And, if someone asks you to stop, then stop. Alert an instructor or TA if you notice a dangerous situation, someone in distress, or violations of this code of conduct, no matter how inconsequential it seems.

The Pandemic

We know that this is one of the least ideal situations in which to be starting your first year of grad school and taking an intensive week-long coding course. Our plan is to be as understanding as possible about grading, attendance, and any issues that come up. Please don’t hesitate to reach out to an instructor or TA if you are having issues, and we will do our best to accommodate you however we can.

Because of the institution’s COVID-19 policy, please wear a face mask except when drinking water. Work areas will be disinfected every evening. We also provide bottles of hand sanitizer at every group work area.

Technology

Platforms

Several platforms will be utilized for communication and distribution of information. These include Github, Slack, Google Forms, etc. Instructions and walkthroughs were previously provided for students to sign up for accounts through Github and Slack. If for any reason you do not have access to one of these accounts, please contact a TA.

Laptops

We provide laptops with pre-configured software and data. Students must sign a Macbook Pro Agreement Form before shipping or upon pickup of the laptop. Note these laptops are to be returned to the Biology office by the end of the spring semester (date specified on the agreement form). Software issues should be communicated to TAs promptly. In contrast, report any hardware issues to the Biology office.

Full Bootcamp Schedule

All time are ET.

  Monday Tuesday Wednesday Thursday Friday
9:00 - 9:30 Overview Homework Exercise Presentations Homework Exercise Presentations Homework Exercise Presentations Homework Exercise Presentations
9:30 - 11:00 Unix & Git Functions & Data Structures Algorithms Exploratory Data Analysis Principal Component Analysis
11:00 - 11:45 Lunch Lunch Lunch Lunch Lunch
11:45 - 1:45 Genome Arithmetic ID Mapping Binary Search Data Visualization Human Genetic Variation
1:45 - 2:00 Break Break Break Break Break
2:00 - 2:30 Lunch Exercise Presentations Lunch Exercise Presentations Lunch Exercise Presentations Lunch Exercise Presentations Lunch Exercise Presentations
2:30 - 4:00 Basic Python & Pseudocode Writing & Running Python Scripts Review by Comfort Level Linear models Review by Comfort Level
Homework Parsing a SAM file Heuristic Sequence Alignment Homework Catch-up Linear Regression

Although the course formally starts at 9 am ET, please arrive by 8:45 am.