Computational Techniques in Biological Sequence Analysis
Time: Mon/Wed 4:00 - 5:20 pm (Except the Feb 15 – 23 week)
Location: MC 1056
Instructor: Dr. Pablo Millan Arias
Email: pmillana[at]uwaterloo[dot]ca
Office: DC 3620
Office hours: Tue Fri 4:00 pm – 5:00 pm or by appointment
Teaching assistants:
-
Michael Wrana
Email: mmwrana@uwaterloo.ca
Office hours: Mon 9:00 am – 10:00 am (Tentative) DC 3333 -
Jessica Zhu
Email: j364zhu@uwaterloo.ca
Office hours: Wed 11:00 am – 12:00 pm Station A in MC4065 (CS consulting centre)
Piazza: link
Course Description: Due to the large volume of data generated by genome sequencing and cellular measurements of gene expression changes, computer science and mathematics have profoundly changed the science of modern biology. Computational methods are now critical to developing experimental and analytical tools in biological sciences. This course will provide students from all disciplines with an introduction to basic computational skills used to understand and analyze biological data. It combines theories and applications, with a focus on the latter.
Main Objectives:
The main aims of the class are for each student to:
- Become familiar with different data types used in molecular biology research or biomedicine.
- Understand the classic algorithms behind the tools so that you can judge which tool is appropriate and which choices should be made during usage.
- Get a first approximation to modern technologies being used as a substitute for classical algorithms to gain more insight into various biological mechanisms.
- Become familiar with some of the online tools that are available for analyzing genomic data.
- Gain practical experience in obtaining data from online resources and analyzing the data within an appropriate biological context.
- For graduate students, relate the course content to their research.
Format: The course will be taught in the classroom as lectures. It (tentatively) consists of 12 topics in the area of computational biology, each of which will be covered by one or two classes each week.
Prerequisites: Students are not required to have any prior knowledge of molecular biology, but some understanding of basic biological concepts is beneficial. Because this is an upper-level course, we prefer that students have taken algorithm/data structure-related courses prior to this course and have solid reasonable programming skills. Some basic math and statistics will be covered in the class.
Course materials: Slides (and some videos) presented during the class will be provided to the students through a Learn after the first week of class. Other materials will be made available to the students where applicable. Redistribution of course materials by the student without the instructor’s consent is not permitted.
Textbook: No textbook is required for this class. Computational molecular biology is a rapidly evolving field. Thus, learning materials are widely available and frequently updated on the Internet, in the forms of forum discussions, program documentation and journal papers, in addition to electronic course materials. The students may find the following optional textbooks beneficial if they want to extend their understanding of computational molecular biology, in both breadth and depth, on the topics covered or not covered by the classes.
- An Introduction to Applied Bioinformatics (IAB) by J Gregory Caporaso. This is a free, interactive ebook, available at: link
- Nature Computational Biology compendium which collected a series of tutorials about classic computational biology-related topics, is available at: link
- Bioinformatics Algorithms: An Active Learning Approach (3rd Edition) (2018) by Phillip Compeau and Pavel Pevzner
- Richard Durbin, Sanger Centre, Cambridge, Sean R. Eddy, Washington University, Missouri, Anders Krogh, Technical University of Denmark, Lyngby, Graeme Mitchison
Grading Scheme: Programming Assignment: 50% (12.5% x 4), Midterm Evaluation: 20%, Final exam: 30%.
Grading Scheme (Graduate version): Programming Assignment: 30% (7.5% x 4), Midterm Evaluation: 10%, Final exam: 30%, Project 25%. See project guidelines
Programming Assignments: The students are expected to complete the assignments on their own. For each assignment, the source code and the assignment report will be reviewed and graded by the TA. The source code should be readable, runnable, and pass the tests. The project report should be detailed enough for the TA to run the code and expected to have the following sections: introduction, implementation details, results, and discussion.The students are expected to program in Python for programming assignments.
Scribing assignment: As part of the course, students will have the opportunity to contribute to accessibility services by scribing lecture notes. Students may sign up for scribing assignments using the provided link.
Each completed scribing assignment will contribute towards the final exam grade as follows:
- Single-lecture topics: Up to 2.5% of the final exam grade
-
Two-lecture topics: Up to 5% of the final exam grade
The exact percentage awarded will depend on the overall quality of the submitted lecture notes.
- For topics that are still to be covered, the notes must be submitted no later than one week before the final lecture on that topic.
- For topics that have already been covered, the notes must be submitted at least one week before the midterm examination.
Late Policy: There is some flexibility in the submission of assignments. Each student is allotted a maximum of 72 extra hours throughout the term to submit the assignments. No justification is required for a late submission. However, no submissions will be accepted after 72 hours. After your total time of 72 hours is consumed, the penalty will be 5% per hour late.
Exam: Students will take a midterm exam and a final exam. The midterm exam will be based on the material covered in the class with knowledge questions as well as programming/design questions. The final exam will be an open book with practical problems and design questions.
Students must take the exams independently, without using reference materials or electronic devices. Make-up exams require permission in advance with a doctor’s excuse in the case of illness. COVID-19 policy: Please refer to the UWaterloo Coronavirus website for updated announcements https://uwaterloo.ca/coronavirus/. “Starting January 9, a mask requirement will not continue but we strongly encourage you to think of the people around you and help limit the spread of COVID by wearing a mask in indoor settings. As always, we will not hesitate to bring back a requirement to wear masks if the situation requires it at any point during the term.”
Class Schedule
This is a tentative class schedule and may be adjusted as the term progresses
Lecture | Date | Topic | Slides | Scribe |
---|---|---|---|---|
L01 | Jan 6 | |||
L02 | Jan 8 | f282wang | ||
L03 | Jan 13 | pdf (Jan 20) | yrohatgi | |
L04 | Jan 15 | pdf (Jan 22) | dxzhang | |
L05 | Jan 20 | |||
L06 | Jan 22 | pdf (Jan 24) | ky3xu | |
*Assignment 1 due Jan 24 | ||||
L07 | Jan 27 | Yuqi | ||
L08 | Jan 29 | |||
L09 | feb 3 | k327lee | ||
L10 | feb 5 | |||
L11 | feb 10 | |||
feb 12 | Miterm Exam (No Lecture) | |||
Reading Week | feb 17 | |||
feb 19 | ||||
L12 | feb 24 | Genome mapping part 2 | ||
L13 | feb 26 | Foundations in Machine Learning: Neural Networks | ogamaras | |
*Assignment 2 due Feb 28 | ||||
L14 | mar 3 | Machine Learning in genomics | a7nagpal (Arnav Nagpal) | |
L15 | mar 5 | Foundations in Deep Learning: Transformer Models | mswitt | |
L16 | mar 10 | DNA Language Models | j3hoque (Juthika Hoque) | |
L17 | mar 12 | Genome Annotation part 1 | dgoping | |
*Assignment 3 due Mar 15 | ||||
L18 | mar 17 | Genome annotation part 2 | dgoping | |
L19 | mar 19 | Clustering | d4hao | |
L20 | mar 24 | RNA-Sequencing & Gene Expression | w27xing | |
L21 | mar 26 | Proteomics and Mass Spectrometry | c234tang | |
L22 | mar 31 | RNA analysis | Anna Zhelizniak | |
L23 | Apr 2 | Protein Structure Prediction Alpha fold | a29asgha | |
*Assignment 4 due Apr 4 |