2025 (Current Year) Faculty Courses School of Computing Department of Computer Science Graduate major in Computer Science
Advanced Data Management
- Academic unit or major
- Graduate major in Computer Science
- Instructor(s)
- Jun Miyazaki / Yang Cao
- Class Format
- Lecture (HyFlex)
- Media-enhanced courses
- -
- Day of week/Period
(Classrooms) - 7-8 Mon / 7-8 Thu
- Class
- -
- Course Code
- CSC.T521
- Number of credits
- 200
- Course offered
- 2025
- Offered quarter
- 1Q
- Syllabus updated
- Mar 31, 2025
- Language
- English
Syllabus
Course overview and goals
With the recent explosive growth in data, computational methods are required to efficiently and securely maintain, manage, and utilize vast amounts of data across various fields. In the field of information engineering, new computational models, efficient algorithms, and productive software design methodologies have been developed to tackle these challenges.
This course will cover these latest technologies. In the first half, students will learn about cloud storage infrastructure for handling large-scale data, parallel processing models, and the MapReduce framework. The second half will focus on privacy-preserving techniques in data science and machine learning, including differential privacy, secure computation, and federated learning. The course aims to provide students with the necessary skills to manage and utilize large-scale data throughout its entire lifecycle.
Course description and aims
By completing this course, students will acquire the following:
- Fundamentals and Applications of Large-Scale Data Management
- Understanding of Cloud Infrastructure and Distributed Data Processing Technologies
- Comprehension of Parallel Processing and High-Performance Computing
- Fundamentals and Applications of Privacy-Preserving Technologies
- Data Lifecycle and Secure Data Management
Keywords
Large-scale data processing, Cloud storage, Distributed data processing, Parallel computing, MapReduce framework, Privacy-preserving technology, Differential privacy, Data lifecycle management
Competencies
- Specialist skills
- Intercultural skills
- Communication skills
- Critical thinking skills
- Practical and/or problem-solving skills
Class flow
Students must thoroughly review the subjects described in the required learning section and study their related topics by themselves after each class.
Course schedule/Objectives
Course schedule | Objectives | |
---|---|---|
Class 1 | Large-scale Data Management | Cloud services for modern large-scale data management |
Class 2 | Data Model and Consistency Model of Key-Value Store | Understanding the properties of distributed key-value stores |
Class 3 | Data Distribution and High Availability in Cloud Storage | Understanding data distribution methods and high availability in cloud storage |
Class 4 | Organization of Cloud Storage | Understanding distributed algorithms used in cloud storage and their purposes |
Class 5 | MapReduce Framework and Its Computational Model | Understanding the advantages of the MapReduce framework |
Class 6 | Large-Scale Text Processing Algorithms Using MapReduce | Understanding the algorithm for building an inverted index with MapReduce |
Class 7 | Large-Scale Graph Processing Algorithms Using MapReduce | Understanding the PageRank algorithm with MapReduce |
Class 8 | Data Lifecycle | Understanding the lifecycle of data collection, analysis, and sharing |
Class 9 | Risks in Data Management | Understanding privacy attacks |
Class 10 | Privacy-Preserving Techniques 1: Differential Privacy | Basics and applications of differential privacy |
Class 11 | Privacy-Preserving Techniques 2: Secure Computation | Basics and applications of secure computation |
Class 12 | Privacy-Preserving Techniques 3: Federated Learning | Basics and applications of federated learning |
Class 13 | Privacy-Preserving Data Management | Techniques for secure data collection, analysis, and sharing |
Class 14 | Privacy-Preserving Machine Learning | Applications of privacy-preserving techniques in machine learning and large-scale AI |
Study advice (preparation and review)
To enhance effective learning, students are encouraged to spend approximately 100 minutes preparing for class and another 100 minutes reviewing class content afterwards (including assignments) for each class.
They should do so by referring to textbooks and other course material.
Textbook(s)
None required. Handouts used in class can be found on Science Tokyo LMS.
Reference books, course materials, etc.
[Reference books]
J. Lin, C. Dyer, "Data-Intensive Text Processing with MapReduce", Morgan & Claypool Publisher
Dwork, Cynthia, and Aaron Roth. "The algorithmic foundations of differential privacy." Foundations and Trends® in Theoretical Computer Science 9.3–4 (2014): 211-407.
Evaluation methods and criteria
Students will be assessed on their understanding of large-scale data processing, cloud storage, distributed data processing, and trustworthy data management. Students’ course scores are based on quizzes (20%), a midterm assignment (40%) and a term-end assignment (40%).
Related courses
- CSC.T438 : Distributed Algorithms
Prerequisites
Having the following knowledge is desirable:
- Distributed algorithms
- Databases