2025 (Current Year) Faculty Courses School of Computing Department of Computer Science Graduate major in Computer Science
High Performance Scientific Computing
- Academic unit or major
- Graduate major in Computer Science
- Instructor(s)
- Rio Yokota
- Class Format
- Lecture (HyFlex)
- Media-enhanced courses
- -
- Day of week/Period
(Classrooms) - 1-2 Mon / 1-2 Thu
- Class
- -
- Course Code
- CSC.T526
- Number of credits
- 200
- Course offered
- 2025
- Offered quarter
- 1Q
- Syllabus updated
- Mar 31, 2025
- Language
- English
Syllabus
Course overview and goals
This course will equip students with the necessary knowledge and skills to develop fast algorithms and their massively parallel implementation on modern supercomputers using parallel programming techniques such as SIMD, OpenMP, MPI, and CUDA. The course will cover how to use various linear algebra libraries for parallel execution on both CPUs and GPUs. Tutorials on how to use debuggers and profilers in a massively parallel environment will also be given. Demonstration of performance primitives and how to build container environments on TSUBAME will be given, along with tips on how to execute deep learning frameworks on large GPU supercomputers.
Course description and aims
By the end of this course, students will be able to
1. Use SIMD vectorization, shared memory parallelization via OpenMP, and distributed memory parallelization via MPI
2. Program GPUs using OpenACC, CUDA, and HIP
3. Understand how high performance numerical libraries function, and will be able to use them appropriately
4. Debug and profile code in a parallel environment by using parallel debuggers and profilers
5. Use containers and deep learning frameworks on massively parallel computers
Keywords
Vectorization, Shared memory parallelism, Distributed memory parallelism, GPU programming, Python libraries, Matrix Multiplication, Linear solvers, Parallel debugger, Parallel profilers, Containers, Deep Learning
Competencies
- Specialist skills
- Intercultural skills
- Communication skills
- Critical thinking skills
- Practical and/or problem-solving skills
Class flow
Courses will be taught in-person.
Sample codes will be prepared for each lecture, and exercises will be performed on TSUBAME.
Course schedule/Objectives
Course schedule | Objectives | |
---|---|---|
Class 1 | Introduction to parallel programming | Introduction to the basic concepts of parallel programming |
Class 2 | Shared memory parallelization | Use OpenMP to achieve shared memory parallelization |
Class 3 | Distributed memory parallelization | Use MPI to achieve distributed memory parallelization |
Class 4 | SIMD parallelization | Use SSE, AVX, and AVX512 to achieve SIMD vectorization |
Class 5 | GPU programming 1 | Use OpenACC and OpenMP to program GPUs |
Class 6 | GPU programming 2 | Use CUDA and HIP to program GPUs |
Class 7 | Cache blocking | Use BLISLAB and CUBLAS as an example to practice cache blocking |
Class 8 | High performance Python | Understand how numPy, cuPy, and other libraries can be used to accelerate Python code |
Class 9 | I/O libraries | Use NetCDF, HDF5, MPI-IO to read and write on large parallel file systems |
Class 10 | Parallel debugger | Use CUDA-GDB、ARM DDT to debug parallel code |
Class 11 | Parallel profiler | Use gprof, VTune, NVProf to profile parallel code |
Class 12 | Containers | Use Singularity with Docker images to build container environments |
Class 13 | Scientific Computing | Learn how to discretize partial differential equations and parallelize the resulting system of equations |
Class 14 | Deep Learning | Use PyTorch to train a large neural network on a parallel computer |
Study advice (preparation and review)
To enhance effective learning, students are encouraged to spend approximately 100 minutes preparing for class and another 100 minutes reviewing class content afterwards (including assignments) for each class.
They should do so by referring to textbooks and other course material.
Textbook(s)
None
Reference books, course materials, etc.
None
Evaluation methods and criteria
Evaluation is based on written reports (40%) and final report (60%).
Related courses
- Numerical Analysis
- Basic Application of Computing and Mathematical Sciences
Prerequisites
None