2023 Faculty Courses School of Engineering Department of Information and Communications Engineering Graduate major in Information and Communications Engineering
Speech Information Technology
- Academic unit or major
- Graduate major in Information and Communications Engineering
- Instructor(s)
- Takahiro Shinozaki
- Class Format
- Lecture (Face-to-face)
- Media-enhanced courses
- -
- Day of week/Period
(Classrooms) - 1-2 Tue (G1-103 (G114)) / 1-2 Fri (G1-103 (G114))
- Class
- -
- Course Code
- ICT.H503
- Number of credits
- 200
- Course offered
- 2023
- Offered quarter
- 1Q
- Syllabus updated
- Jul 8, 2025
- Language
- English
Syllabus
Course overview and goals
This course focuses on automatic speech recognition and synthesis, where the topics include signal processing, human interface, and statistical models. The statistical models are vital components of today's speech processing technology. The hidden Markov model, graphical model, N-gram, weighted finite-state transducer, and artificial neural network are mainly explained in detail. By combining lectures and exercises, the course enables students to understand and acquire the fundamentals of up-to-date speech processing techniques.
Speech communication is natural to us humans, and we speak and listen to a lot of utterances in our daily lives. It is so easy, and we rarely consider how we do it. However, it requires highly complicated processing from the engineering point of view. Today's up-to-date systems have only limited performance compared to humans. However, based on the accumulated research effort, some systems are recently giving a comparable or even better performance in some specific conditions. Through this course, students will learn how our speech communication is sophisticated. Simultaneously, students will have some concrete ideas about challenging it using the learned techniques as clues.
Course description and aims
By the end of this course, students will be able to:
1) Make speech models based on statistical methods
2) Induce algorithms for model training and inferences
3) Explain how speech recognition and synthesis systems are organized
4) Explain the relationship between the mechanism of speech communication and speech processing systems
5) Explain the organization of speech based human interface
6) Formulate some basic problems of signal processing for speech signals
Keywords
speech recognition, speech synthesis, speech enhancement, human interface, speech production mechanism, auditory mechanism, hidden Markov model, N-gram, weighted finite state transducer, graphical model, Bayesian inference, artificial neural network, spoken language acquisition
Competencies
- Specialist skills
- Intercultural skills
- Communication skills
- Critical thinking skills
- Practical and/or problem-solving skills
Class flow
Students will work on exercises to review the lectures and submit answer reports through T2SCHOLAR. Students are expected to prepare for the class checking the course schedule.
Course schedule/Objectives
Course schedule | Objectives | |
---|---|---|
Class 1 | Speech and human interface | Explain human interface using speech |
Class 2 | Speech production mechanism, auditory mechanism, and phonology | Explain how human speech communication is realized |
Class 3 | Waveform Coding and speech signal analysis | Explain techniques for representing and analyzing sound signals |
Class 4 | Parametric representation of speech signals | Explain Parametric representation method of speech signals |
Class 5 | Basics of probability distributions and principles of speech recognition and synthesis | Explain the basics of probability distributions and the principles of statistical automatic speech recognition and synthesis |
Class 6 | Graphical model | Explain graphical model which gives diagrammatic representations of probability distributions |
Class 7 | Markov and Hidden Markov models | Explain the definitions of Markov and hidden Markov models. Explain their application to speech recognition and synthesis |
Class 8 | Weighted finite state transducer | Explain weighted finite state transducers that can express various probabilistic models in a systematic manner |
Class 9 | Dynamic programming and Viterbi algorithm | Explain dynamic programming and Viterbi algorithm |
Class 10 | Bayesian inference | Explain Bayesian inference and its application to some problems |
Class 11 | Basics of artificial neural network | Explain the basics of artificial neural networks, including their learning and inference algorithms |
Class 12 | Neural network based speech recognition and synthesis | Explain neural network-based speech recognition and synthesis systems |
Class 13 | Markov decision process and dialogue systems | Explain Markov decision process and dialogue systems |
Class 14 | Reinforcement learning and spoken language acquisition | Explain reinforcement learning and spoken language acquisition |
Study advice (preparation and review)
To enhance effective learning, students are encouraged to spend approximately 100 minutes preparing for class and another 100 minutes reviewing class content afterwards (including assignments) for each class.
They should do so by referring to textbooks and other course material.
Textbook(s)
Handouts are distributed
Reference books, course materials, etc.
C. Bishop, "Pattern Recognition and Machine Learning," Springer, ISBN-13: 978-0387310732
L. R. Rabiner, B. H. Juang, "Fundamentals of Speech Recognition," Prentice Hall, ISBN-13: 978-0130151575
X. Huang, A. Acero, H.-W. Hon, "Spoken Language Processing," Prentice Hall, ISBN-13: 978-0130226167
Evaluation methods and criteria
Evaluate the student's understandings about speech recognition, speech synthesis, speech signal processing, and statistical models used in there.
Report is 40% and the final exam is 60%.
The final exam may be replaced with a final report if the situation does not permit students to come to University, in which case the evaluation is only based on reports.
Related courses
- ICT.H410 : Computational Linguistics
- ICT.H416 : Statistical Theories for Brain and Parallel Computing
- ICT.H508 : Language Engineering
Prerequisites
Students are required to have knowledge that corresponds to the following classes.
LAS.M102 : Linear Algebra I / Recitation
LAS.M101 : Calculus I / Recitation
ICT.S206 : Signal and System Analysis