トップページへ

2025 (Current Year) Faculty Courses School of Computing Department of Computer Science Graduate major in Computer Science

Advanced Data Management

Academic unit or major
Graduate major in Computer Science
Instructor(s)
Jun Miyazaki / Yang Cao
Class Format
Lecture (HyFlex)
Media-enhanced courses
-
Day of week/Period
(Classrooms)
7-8 Mon (W3-301(W331)) / 7-8 Thu (W3-301(W331))
Class
-
Course Code
CSC.T521
Number of credits
200
Course offered
2025
Offered quarter
1Q
Syllabus updated
Mar 31, 2025
Language
English

Syllabus

Course overview and goals

With the recent explosive growth in data, computational methods are required to efficiently and securely maintain, manage, and utilize vast amounts of data across various fields. In the field of information engineering, new computational models, efficient algorithms, and productive software design methodologies have been developed to tackle these challenges.

This course will cover these latest technologies. In the first half, students will learn about cloud storage infrastructure for handling large-scale data, parallel processing models, and the MapReduce framework. The second half will focus on privacy-preserving techniques in data science and machine learning, including differential privacy, secure computation, and federated learning. The course aims to provide students with the necessary skills to manage and utilize large-scale data throughout its entire lifecycle.

Course description and aims

By completing this course, students will acquire the following:
- Fundamentals and Applications of Large-Scale Data Management
- Understanding of Cloud Infrastructure and Distributed Data Processing Technologies
- Comprehension of Parallel Processing and High-Performance Computing
- Fundamentals and Applications of Privacy-Preserving Technologies
- Data Lifecycle and Secure Data Management

Keywords

Large-scale data processing, Cloud storage, Distributed data processing, Parallel computing, MapReduce framework, Privacy-preserving technology, Differential privacy, Data lifecycle management

Competencies

  • Specialist skills
  • Intercultural skills
  • Communication skills
  • Critical thinking skills
  • Practical and/or problem-solving skills

Class flow

Students must thoroughly review the subjects described in the required learning section and study their related topics by themselves after each class.

Course schedule/Objectives

Course schedule Objectives
Class 1

Large-scale Data Management

Cloud services for modern large-scale data management

Class 2

Data Model and Consistency Model of Key-Value Store

Understanding the properties of distributed key-value stores

Class 3

Data Distribution and High Availability in Cloud Storage

Understanding data distribution methods and high availability in cloud storage

Class 4

Organization of Cloud Storage

Understanding distributed algorithms used in cloud storage and their purposes

Class 5

MapReduce Framework and Its Computational Model

Understanding the advantages of the MapReduce framework

Class 6

Large-Scale Text Processing Algorithms Using MapReduce

Understanding the algorithm for building an inverted index with MapReduce

Class 7

Large-Scale Graph Processing Algorithms Using MapReduce

Understanding the PageRank algorithm with MapReduce

Class 8

Data Lifecycle

Understanding the lifecycle of data collection, analysis, and sharing

Class 9

Risks in Data Management

Understanding privacy attacks

Class 10

Privacy-Preserving Techniques 1: Differential Privacy

Basics and applications of differential privacy

Class 11

Privacy-Preserving Techniques 2: Secure Computation

Basics and applications of secure computation

Class 12

Privacy-Preserving Techniques 3: Federated Learning

Basics and applications of federated learning

Class 13

Privacy-Preserving Data Management

Techniques for secure data collection, analysis, and sharing

Class 14

Privacy-Preserving Machine Learning

Applications of privacy-preserving techniques in machine learning and large-scale AI

Study advice (preparation and review)

To enhance effective learning, students are encouraged to spend approximately 100 minutes preparing for class and another 100 minutes reviewing class content afterwards (including assignments) for each class.
They should do so by referring to textbooks and other course material.

Textbook(s)

None required. Handouts used in class can be found on Science Tokyo LMS.

Reference books, course materials, etc.

[Reference books]
J. Lin, C. Dyer, "Data-Intensive Text Processing with MapReduce", Morgan & Claypool Publisher
Dwork, Cynthia, and Aaron Roth. "The algorithmic foundations of differential privacy." Foundations and Trends® in Theoretical Computer Science 9.3–4 (2014): 211-407.

Evaluation methods and criteria

Students will be assessed on their understanding of large-scale data processing, cloud storage, distributed data processing, and trustworthy data management. Students’ course scores are based on quizzes (20%), a midterm assignment (40%) and a term-end assignment (40%).

Related courses

  • CSC.T438 : Distributed Algorithms

Prerequisites

Having the following knowledge is desirable:
- Distributed algorithms
- Databases