PQHS 471 – Machine Learning and Data Mining

Table of Contents

I. Welcome Video

II. Course Overview at a Glance

III. Course Objectives

IV. Course Description

V. Tentative Schedule

I. Welcome Video


Videos removed on print.
Jump to top

II. Course Overview at a Glance

Time & Place January 12 – May 7, 2026

Thursday 2:30pm – 5pm, Wood Building WG73

Instructor Abdus Sattar, PhD, LLM
Office Wood building: WG51
Teaching Assistant Towsif Raiyan (email: txr269@case.edu)
Office Hours Mr. Raiyan: Thursday 1 pm – 2 pm or by appointment
Dr. Sattar: Monday 1 pm – 2 pm or by appointment
E-mail/Phone: Phone: 1-216-368-1501, Email: sattar@case.edu
Course Web Page sattar.case.edu
canvas.case.edu
Textbook (Required) An Introduction to Statistical Learning with Applications in Python by James, Witten, Hastie, Tibshirani, and Taylor, 2023
Prerequisites:

  • This course is designed for advanced undergraduate students, and graduate students in Biostatistics or other sciences with background and adequate preparation in statistical methods (at least one statistics course, equivalent to PQHS 431 course experience). The course will use statistical formulas (e.g., probability mass functions) and theories (e.g., likelihood theory) in explaining various concepts of machine learning and data mining. Undergraduate students require the instructor’s permission for enrollment.
  • Knowledge of statistical computing is required. We aim to use Python for all computing. Some programming experience will be helpful.
Disability Help:

If you have a disability and need help, please contact me and the Office of Educational Support Services at disability@case.edu, 216.368.5230 as early as possible in the term.

Academic Integrity:

You are expected to maintain the highest integrity in your work for this class. This includes not passing off anyone else’s work as your own, even with their permission. Your homework solutions must be your own work, not from outside sources, consistent with the university rules on academic honesty. I expect you to follow this policy scrupulously. Evidence of academic dishonesty may lead to loss of credit for the assignment, and possibly failure of the course.

Jump to top

III. Course Objectives

  1. Gain proficiency in core supervised learning methods, including regularization, classification algorithms, tree-based models, support vector machine, and deep learning methods.

  2. Acquire competency in standard and modern unsupervised learning techniques, including clustering and dimensionality-reduction methods.

  3. Hone practical skills by implementing machine learning methods in Python and applying them to biomedical and clinical datasets.

  4. Develop the ability to interpret, evaluate, and communicate machine learning results to collaborators in scientific and clinical research.

Jump to top

IV. Course Description

Machine learning and data mining play a central role in modern biomedical, clinical, public health, and social research, where vast and complex datasets require efficient, flexible, and interpretable analytical tools. This course provides a practical and theoretically grounded introduction to key supervised and unsupervised learning methods used to uncover structure in data, build predictive models, and interpret underlying scientific relationships. Students will learn statistical learning principles and a broad set of modeling techniques, including regularization methods (ridge, Lasso), tree-based approaches (bagging, random forests, boosting), support vector machines, neural networks and deep learning, clustering, and other unsupervised learning tools such as principal components analysis. Cross-cutting topics such as model assessment, cross-validation, bootstrap methods, and effective visualization will be emphasized throughout. Applications will draw from biomedical, clinical, and public health research as well as other quantitative disciplines. Students will gain hands-on experience implementing methods in Python, interpreting results, and communicating findings to scientific collaborators. The course combines lectures, demonstrations, computational exercises, homework assignments, and data-driven problem solving tied directly to students’ professional interests. Primary material will follow An Introduction to Statistical Learning with Applications in Python (James, Witten, Hastie, Tibshirani, and Taylor, 2023), supplemented with additional examples and contemporary developments.

Course Requirements and Grading

Assignments:

There will be six (6) homework assignments, one (1) midterm exam, and one (1) final exam. No late assignments will be accepted unless you have a university-excused absence. Real data analysis problems drawn from scientific studies will give you the opportunity to demonstrate your machine learning and data mining knowledge through model development, evaluation, and interpretation.

Midterm and Final Exams:

Midterm Exam – March 5, 2026 (test will cover topics until 2/26). Time, 3:30 – 5:00 PM (One and half-hour).

Final Exam – April 30 – May 7 (Chapters 9 and 10). TBD (Two hours).

Grading Scale: The course grade will be determined according to the following:

Homework 40%
Midterm Exam 30%
Final Exam 30%

Jump to top

V. Tentative Schedule

Week Date Topics Textbook Sections
1 01/15 Introduction to machine learning 1
Basics of Data Mining 2.1
Matrix algebra preliminaries Slides
Lab: Introduction to Python 2.3
2 01/22 Data mining: Clustering 12.4
Lab: Clustering 12.5.3
3 01/29 Principal Component Analysis 12.2
Lab: PCA 12.5.1
02/02 Homework 1 due by Feb 2 (Monday)
4 02/05 Classification, logistic regression, LDA, QDA, KNN, GLM Ch. 4
Lab: Supervised learning 4.7
5 02/12 Resampling methods (CV, bootstrap) 5.1, 5.2
Lab: Cross-validation and bootstrapping 5.3
02/16 Homework 2 due by Feb 16 (Monday)
6 02/19 Linear model selection & regularization 6.1, 6.2
Lab: model selection & regularization 6.5
02/23 Homework 3 due by Feb 23 (Monday)
7 02/26 Basics of tree-based methods, Bagging, random forest, boosting Ch. 8
Lab: Tree-based methods 8.3
8 03/05 Bayesian adaptive regression trees 8.2.4
2:30 PM – 3:20 PM (lecture part)
Midterm Exam 3:30 – 5:00 pm
9 03/12 Spring Break (no class)
10 03/19 Support Vector Machine (SVM) Ch. 9
  Lab: SVM 9.6
03/23 Homework 4 due by March 23 (Monday)
11 03/26 Deep learning basics: SLNN, MLNN 10.1, 10.2
  Lab: Deep learning coding in Python 10.9.1, 10.9.2
12 04/02 Convolutional Neural Networks 10.3
  Lab: CNN coding in Python 10.9.3
04/06 Homework 5 due by April 06 (Monday)
13 04/09 Recurrent Neural Networks 10.5
  Lab: RNN coding in Python 10.9.6
14 04/16 Fitting Neural Networks
Lab: NN fitting in Python
04/20 Homework 6 due by April 20 (Monday)
Final Exam: TBD

*Relevant handouts, articles, etc will be provided.

Jump to top