トップページへ

2021 Faculty Courses School of Computing Department of Computer Science Graduate major in Computer Science

High Performance Scientific Computing

Academic unit or major
Graduate major in Computer Science
Instructor(s)
Rio Yokota
Class Format
Lecture
Media-enhanced courses
-
Day of week/Period
(Classrooms)
1-2 Mon / 1-2 Thu
Class
-
Course Code
CSC.T526
Number of credits
200
Course offered
2021
Offered quarter
1Q
Syllabus updated
Jul 10, 2025
Language
Japanese

Syllabus

Course overview and goals

This course will equip students with the necessary knowledge and skills to develop fast algorithms and their massively parallel implementation on modern supercomputers using parallel programming techniques such as SIMD, OpenMP, MPI, and CUDA. The course will cover how to use various linear algebra libraries for parallel execution on both CPUs and GPUs. Tutorials on how to use debuggers and profilers in a massively parallel environment will also be given. Demonstration of performance primitives and how to build container environments on TSUBAME will be given, along with tips on how to execute deep learning frameworks on large GPU supercomputers.

Course description and aims

By the end of this course, students will be able to
1. Use SIMD vectorization, shared memory parallelization via OpenMP, and distributed memory parallelization via MPI
2. Program GPUs using OpenACC, CUDA, and HIP
3. Understand how high performance numerical libraries function, and will be able to use them appropriately
4. Debug and profile code in a parallel environment by using parallel debuggers and profilers
5. Use containers and deep learning frameworks on massively parallel computers

Keywords

Vectorization, Shared memory parallelism, Distributed memory parallelism, GPU programming, Python libraries, Matrix Multiplication, Linear solvers, Parallel debugger, Parallel profilers, Containers, Deep Learning

Competencies

  • Specialist skills
  • Intercultural skills
  • Communication skills
  • Critical thinking skills
  • Practical and/or problem-solving skills

Class flow

Courses will be taught online.
Sample codes will be prepared for each lecture, and exercises will be performed on TSUBAME.

Course schedule/Objectives

Course schedule Objectives
Class 1

Introduction to parallel programming

Introduction to the basic concepts of parallel programming

Class 2

Shared memory parallelization

Use OpenMP to achieve shared memory parallelization

Class 3

Distributed memory parallelization

Use MPI to achieve distributed memory parallelization

Class 4

SIMD parallelization

Use SSE, AVX, and AVX512 to achieve SIMD vectorization

Class 5

GPU programming 1

Use OpenACC and OpenMP to program GPUs

Class 6

GPU programming 2

Use CUDA and HIP to program GPUs

Class 7

Parallel programming models

Use advanced parallel programming models such as StarPU, OmpSs, and Legion

Class 8

Cache blocking

Use BLISLAB and CUBLAS as an example to practice cache blocking

Class 9

High performance Python

Understand how numPy, cuPy, and other libraries can be used to accelerate Python code

Class 10

I/O libraries

Use NetCDF, HDF5, MPI-IO to read and write on large parallel file systems

Class 11

Parallel debugger

Use CUDA-GDB, Valgrind, TotalView to debug parallel code

Class 12

Parallel profiler

Use gprof, VTune, PAPI, Tau, Vampire to profile parallel code

Class 13

Containers

Use Singularity with Docker images to build container environments

Class 14

Scientific Computing

Learn how to discretize partial differential equations and parallelize the resulting system of equations

Class 15

Deep Learning

Use PyTorch to train a large neural network on a parallel computer

Study advice (preparation and review)

To enhance effective learning, students are encouraged to spend approximately 100 minutes preparing for class and another 100 minutes reviewing class content afterwards (including assignments) for each class.
They should do so by referring to textbooks and other course material.

Textbook(s)

None

Reference books, course materials, etc.

None

Evaluation methods and criteria

Evaluation is based on written reports (40%) and final report (60%).

Related courses

  • Numerical Analysis
  • Basic Application of Computing and Mathematical Sciences

Prerequisites

None

Other

The Zoom link will be send to registered students one day before the first lecture.