トップページへ

2025 (Current Year) Faculty Courses School of Computing Department of Computer Science Graduate major in Computer Science

Advanced Data Management

Academic unit or major
Graduate major in Computer Science
Instructor(s)
Jun Miyazaki / Yang Cao
Class Format
Lecture (HyFlex)
Media-enhanced courses
-
Day of week/Period
(Classrooms)
7-8 Mon / 7-8 Thu
Class
-
Course Code
CSC.T521
Number of credits
200
Course offered
2025
Offered quarter
1Q
Syllabus updated
Mar 31, 2025
Language
English

Syllabus

Course overview and goals

With the recent explosive growth in data, computational methods are required to efficiently and securely maintain, manage, and utilize vast amounts of data across various fields. In the field of information engineering, new computational models, efficient algorithms, and productive software design methodologies have been developed to tackle these challenges.

This course will cover these latest technologies. In the first half, students will learn about cloud storage infrastructure for handling large-scale data, parallel processing models, and the MapReduce framework. The second half will focus on privacy-preserving techniques in data science and machine learning, including differential privacy, secure computation, and federated learning. The course aims to provide students with the necessary skills to manage and utilize large-scale data throughout its entire lifecycle.

Course description and aims

By completing this course, students will acquire the following:
- Fundamentals and Applications of Large-Scale Data Management
- Understanding of Cloud Infrastructure and Distributed Data Processing Technologies
- Comprehension of Parallel Processing and High-Performance Computing
- Fundamentals and Applications of Privacy-Preserving Technologies
- Data Lifecycle and Secure Data Management

Keywords

Large-scale data processing, Cloud storage, Distributed data processing, Parallel computing, MapReduce framework, Privacy-preserving technology, Differential privacy, Data lifecycle management

Competencies

  • Specialist skills
  • Intercultural skills
  • Communication skills
  • Critical thinking skills
  • Practical and/or problem-solving skills

Class flow

Students must thoroughly review the subjects described in the required learning section and study their related topics by themselves after each class.

Course schedule/Objectives

Course schedule Objectives
Class 1 Large-scale Data Management Cloud services for modern large-scale data management
Class 2 Data Model and Consistency Model of Key-Value Store Understanding the properties of distributed key-value stores
Class 3 Data Distribution and High Availability in Cloud Storage Understanding data distribution methods and high availability in cloud storage
Class 4 Organization of Cloud Storage Understanding distributed algorithms used in cloud storage and their purposes
Class 5 MapReduce Framework and Its Computational Model Understanding the advantages of the MapReduce framework
Class 6 Large-Scale Text Processing Algorithms Using MapReduce Understanding the algorithm for building an inverted index with MapReduce
Class 7 Large-Scale Graph Processing Algorithms Using MapReduce Understanding the PageRank algorithm with MapReduce
Class 8 Data Lifecycle Understanding the lifecycle of data collection, analysis, and sharing
Class 9 Risks in Data Management Understanding privacy attacks
Class 10 Privacy-Preserving Techniques 1: Differential Privacy Basics and applications of differential privacy
Class 11 Privacy-Preserving Techniques 2: Secure Computation Basics and applications of secure computation
Class 12 Privacy-Preserving Techniques 3: Federated Learning Basics and applications of federated learning
Class 13 Privacy-Preserving Data Management Techniques for secure data collection, analysis, and sharing
Class 14 Privacy-Preserving Machine Learning Applications of privacy-preserving techniques in machine learning and large-scale AI

Study advice (preparation and review)

To enhance effective learning, students are encouraged to spend approximately 100 minutes preparing for class and another 100 minutes reviewing class content afterwards (including assignments) for each class.
They should do so by referring to textbooks and other course material.

Textbook(s)

None required. Handouts used in class can be found on Science Tokyo LMS.

Reference books, course materials, etc.

[Reference books]
J. Lin, C. Dyer, "Data-Intensive Text Processing with MapReduce", Morgan & Claypool Publisher
Dwork, Cynthia, and Aaron Roth. "The algorithmic foundations of differential privacy." Foundations and Trends® in Theoretical Computer Science 9.3–4 (2014): 211-407.

Evaluation methods and criteria

Students will be assessed on their understanding of large-scale data processing, cloud storage, distributed data processing, and trustworthy data management. Students’ course scores are based on quizzes (20%), a midterm assignment (40%) and a term-end assignment (40%).

Related courses

  • CSC.T438 : Distributed Algorithms

Prerequisites

Having the following knowledge is desirable:
- Distributed algorithms
- Databases