Computational Techniques in Biological Sequence Analysis

Time: Mon/Wed 4:00 - 5:20 pm (Except the Feb 15 – 23 week)
Location: MC 1056
Instructor: Dr. Pablo Millan Arias
Email: pmillana[at]uwaterloo[dot]ca
Office: DC 3620
Office hours: Tue Fri 4:00 pm – 5:00 pm or by appointment

Teaching assistants:

  • Michael Wrana
    Email: mmwrana@uwaterloo.ca
    Office hours: Mon 9:00 am – 10:00 am (Tentative) DC 3333

  • Jessica Zhu
    Email: j364zhu@uwaterloo.ca
    Office hours: Wed 11:00 am – 12:00 pm Station A in MC4065 (CS consulting centre)

Piazza: link

Course Description: Due to the large volume of data generated by genome sequencing and cellular measurements of gene expression changes, computer science and mathematics have profoundly changed the science of modern biology. Computational methods are now critical to developing experimental and analytical tools in biological sciences. This course will provide students from all disciplines with an introduction to basic computational skills used to understand and analyze biological data. It combines theories and applications, with a focus on the latter.

Main Objectives:
The main aims of the class are for each student to:

  1. Become familiar with different data types used in molecular biology research or biomedicine.
  2. Understand the classic algorithms behind the tools so that you can judge which tool is appropriate and which choices should be made during usage.
  3. Get a first approximation to modern technologies being used as a substitute for classical algorithms to gain more insight into various biological mechanisms.
  4. Become familiar with some of the online tools that are available for analyzing genomic data.
  5. Gain practical experience in obtaining data from online resources and analyzing the data within an appropriate biological context.
  6. For graduate students, relate the course content to their research.

Format: The course will be taught in the classroom as lectures. It (tentatively) consists of 12 topics in the area of computational biology, each of which will be covered by one or two classes each week.

Prerequisites: Students are not required to have any prior knowledge of molecular biology, but some understanding of basic biological concepts is beneficial. Because this is an upper-level course, we prefer that students have taken algorithm/data structure-related courses prior to this course and have solid reasonable programming skills. Some basic math and statistics will be covered in the class.

Course materials: Slides (and some videos) presented during the class will be provided to the students through a Learn after the first week of class. Other materials will be made available to the students where applicable. Redistribution of course materials by the student without the instructor’s consent is not permitted.

Textbook: No textbook is required for this class. Computational molecular biology is a rapidly evolving field. Thus, learning materials are widely available and frequently updated on the Internet, in the forms of forum discussions, program documentation and journal papers, in addition to electronic course materials. The students may find the following optional textbooks beneficial if they want to extend their understanding of computational molecular biology, in both breadth and depth, on the topics covered or not covered by the classes.

  • An Introduction to Applied Bioinformatics (IAB) by J Gregory Caporaso. This is a free, interactive ebook, available at: link
  • Nature Computational Biology compendium which collected a series of tutorials about classic computational biology-related topics, is available at: link
  • Bioinformatics Algorithms: An Active Learning Approach (3rd Edition) (2018) by Phillip Compeau and Pavel Pevzner
  • Richard Durbin, Sanger Centre, Cambridge, Sean R. Eddy, Washington University, Missouri, Anders Krogh, Technical University of Denmark, Lyngby, Graeme Mitchison

Grading Scheme: Programming Assignment: 50% (12.5% x 4), Midterm Evaluation: 20%, Final exam: 30%.

Grading Scheme (Graduate version): Programming Assignment: 30% (7.5% x 4), Midterm Evaluation: 10%, Final exam: 30%, Project 25%. See project guidelines

Programming Assignments: The students are expected to complete the assignments on their own. For each assignment, the source code and the assignment report will be reviewed and graded by the TA. The source code should be readable, runnable, and pass the tests. The project report should be detailed enough for the TA to run the code and expected to have the following sections: introduction, implementation details, results, and discussion.The students are expected to program in Python for programming assignments.

Scribing assignment: As part of the course, students will have the opportunity to contribute to accessibility services by scribing lecture notes. Students may sign up for scribing assignments using the provided link.

Each completed scribing assignment will contribute towards the final exam grade as follows:

  • Single-lecture topics: Up to 2.5% of the final exam grade
  • Two-lecture topics: Up to 5% of the final exam grade

    The exact percentage awarded will depend on the overall quality of the submitted lecture notes.

  • For topics that are still to be covered, the notes must be submitted no later than one week before the final lecture on that topic.
  • For topics that have already been covered, the notes must be submitted at least one week before the midterm examination.

Late Policy: There is some flexibility in the submission of assignments. Each student is allotted a maximum of 72 extra hours throughout the term to submit the assignments. No justification is required for a late submission. However, no submissions will be accepted after 72 hours. After your total time of 72 hours is consumed, the penalty will be 5% per hour late.

Exam: Students will take a midterm exam and a final exam. The midterm exam will be based on the material covered in the class with knowledge questions as well as programming/design questions. The final exam will be an open book with practical problems and design questions.

Students must take the exams independently, without using reference materials or electronic devices. Make-up exams require permission in advance with a doctor’s excuse in the case of illness. COVID-19 policy: Please refer to the UWaterloo Coronavirus website for updated announcements https://uwaterloo.ca/coronavirus/. “Starting January 9, a mask requirement will not continue but we strongly encourage you to think of the people around you and help limit the spread of COVID by wearing a mask in indoor settings. As always, we will not hesitate to bring back a requirement to wear masks if the situation requires it at any point during the term.”

Class Schedule

This is a tentative class schedule and may be adjusted as the term progresses

Lecture Date Topic Slides Scribe
L01 Jan 6 Course Overview and Logistics    
L02 Jan 8 A primer in Computational Biology pdf f282wang
L03 Jan 13 Sequence Alignment pdf (Jan 20) yrohatgi
L04 Jan 15 Sequence Database search pdf (Jan 22) dxzhang
L05 Jan 20 Probabilistic interpretarion of score/ Introduction to AF methods    
L06 Jan 22 Multiple Sequence alignment/Phylogeny pdf (Jan 24) ky3xu
*Assignment 1 due Jan 24        
L07 Jan 27 Phylogeny (Distance-based methods) pdf Yuqi
L08 Jan 29 Phylogeny (Character-based methods)    
L09 feb 3 Genome assembly part 1 pdf k327lee
L10 feb 5 Genome assembly part 2    
L11 feb 10 Genome mapping part 1 pdf  
  feb 12 Miterm Exam (No Lecture)    
Reading Week feb 17      
  feb 19      
L12 feb 24 Genome mapping part 2    
L13 feb 26 Foundations in Machine Learning: Neural Networks   ogamaras
*Assignment 2 due Feb 28        
L14 mar 3 Machine Learning in genomics   a7nagpal (Arnav Nagpal)
L15 mar 5 Foundations in Deep Learning: Transformer Models   mswitt
L16 mar 10 DNA Language Models   j3hoque (Juthika Hoque)
L17 mar 12 Genome Annotation part 1   dgoping
*Assignment 3 due Mar 15        
L18 mar 17 Genome annotation part 2   dgoping
L19 mar 19 Clustering   d4hao
L20 mar 24 RNA-Sequencing & Gene Expression   w27xing
L21 mar 26 Proteomics and Mass Spectrometry   c234tang
L22 mar 31 RNA analysis   Anna Zhelizniak
L23 Apr 2 Protein Structure Prediction Alpha fold   a29asgha
*Assignment 4 due Apr 4