2026 (Current Year) Faculty Courses School of Computing Undergraduate major in Computer Science

Biological Data Analysis

Academic unit or major: Undergraduate major in Computer Science
Instructor(s): Masahito Ohue / Shogo Hamada / Masahiro Takinoue
Class Format: Lecture (Face-to-face)
Media-enhanced courses: -
Day of week/Period (Classrooms): 7-8 Tue (W3-301(W331)) / 7-8 Fri (W3-301(W331))
Class: -
Course Code: CSC.T353
Number of credits: 200
Course offered: 2026
Offered quarter: 2Q
Syllabus updated: Mar 5, 2026
Language: Japanese

Syllabus

Course overview and goals

This course focuses on data representation methods as well as comparative analysis and knowledge extraction algorithms for massive biological data. It also covers numerical simulations and nonlinear system analysis for dynamic biological systems. Topics include pairwise sequence alignment, dynamic programming, multiple sequence alignment, phylogenetic tree estimation, approximation methods, sequence motif representation, rapid homology search techniques for large-scale databases, protein structure modeling and prediction, biological system modeling, and numerical simulations of nonlinear differential equations.
No prior knowledge of biology or biochemistry is required. Basic biological concepts are introduced within the course, and students are expected to consider the topics from the perspectives of computational algorithms and their complexity.
Biological information analysis has become increasingly important in the 21st century for improving quality of life, the environment, and safety. This course therefore aims to provide students with a fundamental understanding of the nature of biological data and commonly used algorithms in this field, such as dynamic programming and differential equation-based system simulations. Many of the methods introduced in the course are also applicable to a wide range of engineering problems. The course is designed to serve as an illustrative example of how computer science techniques can be applied to real-world problems.

Course description and aims

By the successful completion of this course, students will be able to:
1) Explain several data representation for sequence analysis (e.g. regular expression, profile matrix, HMM),
2) Explain the notion and implementation of dynamic programing, as well as its several applications in bioinfomatics,
3) Explain the important role of approximated methods in multiple sequence alignment and phylogenetic tree estimation, in terms of computational complexity,
4) Explain the notion of e-value and p-value in homology search against a large database, and compute the values,
5) Explain several algorithmic techniques to make faster homology search against a large database, and
6) Explain analog and digital simulation approaches for behaviors of living cells.

Keywords

biological information, sequence analysis, dynamic programming, hidden Markov model, analog simulation, digital simulation

Competencies

Specialist skills
Intercultural skills
Communication skills
Critical thinking skills
Practical and/or problem-solving skills

Class flow

Each class starts from explanation of new topic (through notion, example, systems, applicational importance, etc.). At the end of class, students are given exercise problems related to the lecture given that day to solve.

Course schedule/Objectives

	Course schedule	Objectives
Class 1	Basics of simulation of living cells	Brief introduction to molecular biology and mathematical biology
Class 2	Sequence alignment	Calculate global/local sequence alignment based on dynamic programming and multiple sequence alignment
Class 3	Digital simulation of living cells (1)	Stochastic simulation 1
Class 4	Phylogenetic tree estimation	Calculate phylogenetic tree based on UPGMA method or NJ method, distance matrix method, character state method, and bootstrap evaluation
Class 5	Digital simulation of living cells (2)	Stochastic simulation 2
Class 6	Homology search against database	Calculate E-value and P-value for a hit in homology search
Class 7	Analog simulation of living cells (1)	Nonlinear differential equations and MATLAB
Class 8	Faster methods for sequence homology search	Build a k-mer index table for faster similarity search (FASTA, BLAST, PSI-BLAST)
Class 9	Analog simulation of living cells (2)	Nonlinear differential equations and nonlinear system analysis
Class 10	Protein structure analysis	Understand protein secondary/tertiary structures and analysis methods
Class 11	Analog simulation of living cells (3)	Typical examples of nonlinear systems
Class 12	Biomolecular design	Understand protein design and drug design
Class 13	Advanced topics in bioinformatics	Exploration of representative bioinformatics technologies in recent years
Class 14	Examination	Comprehensive topics in this class

Study advice (preparation and review)

To enhance effective learning, students are encouraged to spend approximately 100 minutes preparing for class and another 100 minutes reviewing class content afterwards (including assignments) for each class.
They should do so by referring to textbooks and other course material.

Textbook(s)

Original slides are provided

Reference books, course materials, etc.

Ed. Japanese Society of Bioinformatics. Introduction to Bioinformatics 2nd edition, Keio University Press. ISBN: 978-4-7664-2791-2. (Japanese)

Evaluation methods and criteria

Students' knowledge of data representations, algorithms, and applications in biological information analysis, and their ability to apply them to problems will be assessed.
Final exams: 100%

Related courses

CSC.T362 ： Numerical Analysis
ART.T543 ： Bioinformatics
ART.T546 ： Design Theory in Biological Systems
ART.T545 ： Molecular Simulation
ART.T553 ： Medical and Health Informatics
CSC.T242 ： Probability Theory and Statistics
CSC.T254 ： Machine Learning
CSC.T352 ： Pattern Recognition
ART.T458 ： Advanced Machine Learning

Prerequisites

none