2026 (Current Year) Faculty Courses School of Computing Undergraduate major in Computer Science
Biological Data Analysis
- Academic unit or major
- Undergraduate major in Computer Science
- Instructor(s)
- Masahito Ohue / Shogo Hamada / Masahiro Takinoue
- Class Format
- Lecture (Face-to-face)
- Media-enhanced courses
- -
- Day of week/Period
(Classrooms) - 7-8 Tue (W3-301(W331)) / 7-8 Fri (W3-301(W331))
- Class
- -
- Course Code
- CSC.T353
- Number of credits
- 200
- Course offered
- 2026
- Offered quarter
- 2Q
- Syllabus updated
- Mar 5, 2026
- Language
- Japanese
Syllabus
Course overview and goals
This course focuses on data representation methods as well as comparative analysis and knowledge extraction algorithms for massive biological data. It also covers numerical simulations and nonlinear system analysis for dynamic biological systems. Topics include pairwise sequence alignment, dynamic programming, multiple sequence alignment, phylogenetic tree estimation, approximation methods, sequence motif representation, rapid homology search techniques for large-scale databases, protein structure modeling and prediction, biological system modeling, and numerical simulations of nonlinear differential equations.
No prior knowledge of biology or biochemistry is required. Basic biological concepts are introduced within the course, and students are expected to consider the topics from the perspectives of computational algorithms and their complexity.
Biological information analysis has become increasingly important in the 21st century for improving quality of life, the environment, and safety. This course therefore aims to provide students with a fundamental understanding of the nature of biological data and commonly used algorithms in this field, such as dynamic programming and differential equation-based system simulations. Many of the methods introduced in the course are also applicable to a wide range of engineering problems. The course is designed to serve as an illustrative example of how computer science techniques can be applied to real-world problems.
Course description and aims
By the successful completion of this course, students will be able to:
1) Explain several data representation for sequence analysis (e.g. regular expression, profile matrix, HMM),
2) Explain the notion and implementation of dynamic programing, as well as its several applications in bioinfomatics,
3) Explain the important role of approximated methods in multiple sequence alignment and phylogenetic tree estimation, in terms of computational complexity,
4) Explain the notion of e-value and p-value in homology search against a large database, and compute the values,
5) Explain several algorithmic techniques to make faster homology search against a large database, and
6) Explain analog and digital simulation approaches for behaviors of living cells.
Keywords
biological information, sequence analysis, dynamic programming, hidden Markov model, analog simulation, digital simulation
Competencies
- Specialist skills
- Intercultural skills
- Communication skills
- Critical thinking skills
- Practical and/or problem-solving skills
Class flow
Each class starts from explanation of new topic (through notion, example, systems, applicational importance, etc.). At the end of class, students are given exercise problems related to the lecture given that day to solve.
Course schedule/Objectives
| Course schedule | Objectives | |
|---|---|---|
| Class 1 | Basics of simulation of living cells |
Brief introduction to molecular biology and mathematical biology |
| Class 2 | Sequence alignment |
Calculate global/local sequence alignment based on dynamic programming and multiple sequence alignment |
| Class 3 | Digital simulation of living cells (1) |
Stochastic simulation 1 |
| Class 4 | Phylogenetic tree estimation |
Calculate phylogenetic tree based on UPGMA method or NJ method, distance matrix method, character state method, and bootstrap evaluation |
| Class 5 | Digital simulation of living cells (2) |
Stochastic simulation 2 |
| Class 6 | Homology search against database |
Calculate E-value and P-value for a hit in homology search |
| Class 7 | Analog simulation of living cells (1) |
Nonlinear differential equations and MATLAB |
| Class 8 | Faster methods for sequence homology search |
Build a k-mer index table for faster similarity search (FASTA, BLAST, PSI-BLAST) |
| Class 9 | Analog simulation of living cells (2) |
Nonlinear differential equations and nonlinear system analysis |
| Class 10 | Protein structure analysis |
Understand protein secondary/tertiary structures and analysis methods |
| Class 11 | Analog simulation of living cells (3) |
Typical examples of nonlinear systems |
| Class 12 | Biomolecular design |
Understand protein design and drug design |
| Class 13 | Advanced topics in bioinformatics |
Exploration of representative bioinformatics technologies in recent years |
| Class 14 | Examination |
Comprehensive topics in this class |
Study advice (preparation and review)
To enhance effective learning, students are encouraged to spend approximately 100 minutes preparing for class and another 100 minutes reviewing class content afterwards (including assignments) for each class.
They should do so by referring to textbooks and other course material.
Textbook(s)
Original slides are provided
Reference books, course materials, etc.
Ed. Japanese Society of Bioinformatics. Introduction to Bioinformatics 2nd edition, Keio University Press. ISBN: 978-4-7664-2791-2. (Japanese)
Evaluation methods and criteria
Students' knowledge of data representations, algorithms, and applications in biological information analysis, and their ability to apply them to problems will be assessed.
Final exams: 100%
Related courses
- CSC.T362 : Numerical Analysis
- ART.T543 : Bioinformatics
- ART.T546 : Design Theory in Biological Systems
- ART.T545 : Molecular Simulation
- ART.T553 : Medical and Health Informatics
- CSC.T242 : Probability Theory and Statistics
- CSC.T254 : Machine Learning
- CSC.T352 : Pattern Recognition
- ART.T458 : Advanced Machine Learning
Prerequisites
none