2025 (Current Year) Faculty Courses School of Computing Undergraduate major in Computer Science
Biological Data Analysis
- Academic unit or major
- Undergraduate major in Computer Science
- Instructor(s)
- Yutaka Akiyama / Masayuki Yamamura / Masahiro Takinoue
- Class Format
- Lecture (Face-to-face)
- Media-enhanced courses
- -
- Day of week/Period
(Classrooms) - 7-8 Tue / 7-8 Fri
- Class
- -
- Course Code
- CSC.T353
- Number of credits
- 200
- Course offered
- 2025
- Offered quarter
- 2Q
- Syllabus updated
- Apr 2, 2025
- Language
- Japanese
Syllabus
Course overview and goals
This course focuses on data representation methods and comparative and knowledge extraction algorithms for massive biological data, and numerical simulations and nonlinear system analysis for dynamic bilogical systems. Topics include pairwise sequence alignment, dynamic programming, multiple sequence alignment, phylogenetic tree estimation, approximated methods, sequential motif representation, rapid homology search techniques versus large-scale database, protein structure modeling and structure prediction, modeling of biological systems, numerical simulations of nonlinear differential equations, and so on. In addition, . No biological or biochemical knowledge is a prerequisite. The basic biology notions are introduced within the course, and students are required to consider the topics in the view of computational algorithms and their complexity.
Biological information analysis is significantly important for our society in the 21st century in order to improve our quality of life, environment, and safety. Thus this course is aiming at providing a fundamental understanding of the nature of biological data and typical algorithms, like dynamic programming, system simulation based on differential equations, repeatedly used in this area. On the other hand, most of the methods explained in the course are also applicable to a wide range of engineering subjects. We aim to provide this course to students as an illustrative example of how computer science techniques are applied in a specific real-world problem.
Course description and aims
By the successful completion of this course, students will be able to:
1) Explain several data representation for sequence analysis (e.g. regular expression, profile matrix, HMM),
2) Explain the notion and implementation of dynamic programing, as well as its several applications in bioinfomatics,
3) Explain the important role of approximated methods in multiple sequence alignment and phylogenetic tree estimation, in terms of computational complexity,
4) Explain the notion of e-value and p-value in homology search against a large database, and compute the values,
5) Explain several algorithmic techniques to make faster homology search against a large database, and
6) Explain analog and digital simulation approaches for behaviors of living cells.
Keywords
biological information, sequence analysis, dynamic programming, hidden Markov model, analog simulation, digital simulation
Competencies
- Specialist skills
- Intercultural skills
- Communication skills
- Critical thinking skills
- Practical and/or problem-solving skills
Class flow
Each class starts from explanation of new topic (through notion, example, systems, applicational importance, etc.). At the end of class, students are given exercise problems related to the lecture given that day to solve.
Course schedule/Objectives
Course schedule | Objectives | |
---|---|---|
Class 1 | Global sequence alignment - Optimal path search, dynamic programming, global sequence alignment, local sequence alignment | Calculate global/local sequence alignment based on dynamic programming |
Class 2 | Basics of simulation of living cells | Briefe introduction to molecular biology and mathematical biology |
Class 3 | Multiple sequence alignment - Complexity of multiple alignment, and heuristic methods | Calculate multiple sequence alignment based on star method or tree-based method |
Class 4 | Analog simulation of living cells (1) | Nonlinear differential equations and MATLAB |
Class 5 | Phylogenetic tree estimation -Distance matrix method,character state method,and bootstrap evaluation | Calculate phylogenetic tree based on UPGMA method or NJ method |
Class 6 | Analog simulation of living cells (2) | Nonlinear differential equations and nonlinear system analysis |
Class 7 | Homology search against database -Amino acid mutation matrix,hit significance,e-value,bit score, and p-value | Calculate e-value and p-value for a hit in homology search |
Class 8 | Analog simulation of living cells (3) | Typical examples of nonlinear systems |
Class 9 | Faster methods for sequence homology search -FASTA,BLAST,and PSI-BLAST | Build a k-mer index table for faster similarity search |
Class 10 | Digital simulation of living cells (1) | Stochastic simulation 1 |
Class 11 | Motif representation and extraction -Regular expression, profile matrix, and hidden Markov model | Understanding several mathematical models for representing sequence motifs |
Class 12 | Digital simulation of living cells (2) | Stochastic simulation 2 |
Class 13 | Protein structure analysis -Secondary structure, tertiary structure, and molecular simulation | Understand protein secondary/tertiary structures and analysis methods |
Class 14 | Advanced topics on simulations of living systems | Typical examples of simulation of living systems (Gene network, molecular computing) |
Study advice (preparation and review)
To enhance effective learning, students are encouraged to spend approximately 100 minutes preparing for class and another 100 minutes reviewing class content afterwards (including assignments) for each class.
They should do so by referring to textbooks and other course material.
Textbook(s)
Original slides are provided by Akiyama, Yamamura, and Takinoue.
Reference books, course materials, etc.
(Ed. Japanese Society of Bioinformatics). Introduction to Bioinformatics. Tokyo: Keio University Press; ISBN:978-4-7664-2251-1. (Japanese)
Evaluation methods and criteria
Students' knowledge of data representations, algorithms, and applications in biological information analysis, and their ability to apply them to problems will be assessed.
Final exams: 100%
Related courses
- CSC.T362 : Numerical Analysis
Prerequisites
none