トップページへ

2021 Faculty Courses School of Engineering Common courses

Statistics for Data Science

Academic unit or major
Common courses
Instructor(s)
Daniel Berrar
Class Format
Lecture
Media-enhanced courses
-
Day of week/Period
(Classrooms)
9-10 Mon (W611) / 9-10 Thu (W611)
Class
-
Course Code
XEG.G301
Number of credits
200
Course offered
2021
Offered quarter
3Q
Syllabus updated
Jul 10, 2025
Language
English

Syllabus

Course overview and goals

This course covers the fundamentals of probability theory and statistics for data science. Topics in probability theory include discrete and continuous random variables, probability rules, expected value, correlation, and important probability distributions. This course covers both frequentist statistics (sampling distributions, confidence intervals, significance testing) and Bayesian statistics (Bayes' theorem, Bayesian analysis, credibility intervals). This course also addresses the fundamentals of statistical learning theory. The goal of this course is that the students acquire a solid statistical background that enables them to use the appropriate statistical methodologies and tools to analyze data scientifically. To achieve this goal, the course includes many real-world examples from engineering and the sciences.

Course description and aims

After successful completion of this course, the students will
(1) understand the statistical fundamentals of data science;
(2) be able to analyze data scientifically;
(3) be able to communicate analytical results in an interdisciplinary environment.

Keywords

Bayes' theorem; Bayesian hypothesis test; Beta function; binomial probability distribution; conditional probability; confidence interval; credibility interval; data science; expected value; Gamma function; hypothesis testing; joint probability; marginal probability; naive Bayes classifier; probability distribution; power; p-value; random variable; sampling distribution; significance testing.

Competencies

  • Specialist skills
  • Intercultural skills
  • Communication skills
  • Critical thinking skills
  • Practical and/or problem-solving skills

Class flow

Classes usually begin with a real-world example to motivate a statistical concept. This concept is then formally described, and mathematical proofs are given where appropriate. Then, we will solve the real-world problem together.

Course schedule/Objectives

Course schedule Objectives
Class 1

Course overview; fundamentals of data science; covariance, correlation, regression

None.

Class 2

Conditional probability; Bayes' theorem

Revise contents of previous class; complete assignment

Class 3

Discrete random variables; expected value

Revise contents of previous class; complete assignment

Class 4

Gamma function; binomial probability distributions

Revise contents of previous class; complete assignment

Class 5

Continuous random variables; continuous probability distribution; normal distributions

Revise contents of previous class; complete assignment

Class 6

Distribution of functions of random variables; sampling distributions

Revise contents of previous class; complete assignment

Class 7

Point estimates and confidence intervals

Revise contents of previous class; complete assignment

Class 8

Student's t-distribution

Revise contents of previous class; complete assignment

Class 9

Significance testing; p-value

Revise contents of previous class; complete assignment

Class 10

優位性検定

Revise contents of previous class; complete assignment

Class 11

Beta function; Bayesian analysis

Revise contents of previous class; complete assignment

Class 12

Bayesian data analysis [1/2]

Revise contents of previous class; complete assignment

Class 13

Bayesian data analysis [2/2]

Revise contents of previous class; complete assignment

Class 14

Fundamentals of statistical learning theory

Revise contents of previous class; complete assignment

Study advice (preparation and review)

To enhance effective learning, students are encouraged to spend approximately 100 minutes preparing for class and another 100 minutes reviewing class content afterwards (including assignments) for each class.
They should do so by referring to textbooks and other course material.

Textbook(s)

None required. Course materials are provided during class.

Reference books, course materials, etc.

Bertsekas D.P. and Tsitsiklis J.N. (2008) Introduction to Probability. Athena Scientific; 2nd edition.

Kruschke J. (2014) Doing Bayesian Data Analysis. Academic Press, 2nd edition.

Evaluation methods and criteria

Students' course grades will be based on the final exam.

Related courses

  • XCO.T483 : Advanced Artificial Intelligence and Data Science A
  • IEE.A205 : Statistics for Industrial Engineering and Economics
  • ICT.M202: Probability and Statistics (ICT)
  • XCO.T487 : Fundamentals of data science

Prerequisites

Knowledge of elementary algebra and calculus is required.