PQHS 430 – Design & Analysis of High-Dimensional Data

Table of Contents

I. Welcome Video

II. Course Overview at a Glance

III. Course Objectives

IV. Course Description

V. Tentative Schedule

VI. Sample Lecture

I. Welcome Video


Videos removed on print.
Jump to top

II. Course Overview at a Glance

Time & Place TBD
Instructors Abdus Sattar, PhD
Yu Lui, PhD, Senior Research Associate, CPB
Office BRB: Suite G-19
Teaching Assistant TBD
Office Hours TBA
Course Web Page canvas.case.edu
Textbook (Required) No required text
Reference – Data mining for genomics and proteomics, by Darius M. Dziuda
– Statistics and Data Analysis for Microarrays, by Sorin Draghici
– Bioinformatics for High Throughput Sequencing, edited by Rodriguez-Ezpeleta, Hackenberg, Aransay
Prerequisites:

  • This course is designed for advanced undergraduate students, and graduate students in Biostatistics or other quantitative sciences with background and adequate preparation in statistical methods (at least one statistics, similar to PQHS 431, course experience).
  • Some programming experience. Knowledge in statistical computing or statistical software package is helpful. We aim to use R and some commonly used NGS software (e.g. BWA, Bowtie, Tophat).
Disability Help
If you have a disability and need help, please contact me and the Office of Educational Support Services at disability@case.edu, 216.368.5230 as early as possible in the term.
Academic Integrity
You are expected to maintain the highest integrity in your work for this class. This includes not passing off anyone else’s work as your own, even with their permission. Your homework solutions must be your own work, not from outside sources, consistent with the university rules on academic honesty. I expect you to follow this policy scrupulously. Evidence of academic dishonesty may lead to loss of credit for the assignment, and possibly failure of the course.

Jump to top

II. Course Objectives

  1. Gain proficiency in designing high-throughput studies and statistical learning methods.
  2. Hone skills by applying statistical methods in solving high-dimensional data analysis problems.
  3. Acquire competency in standard and cutting edge high-dimensional methods and algorithms.

Jump to top

III. Course Description

This is an exciting genomic revolutionary era when scientists can use high-throughput data to extract the genetic basis of complex diseases such as cancers. High-dimensional high-throughput data are often encountered in the fields of genomics, proteomics, system biology and bioinformatics. Through this course students will learn how to analyze the high-dimensional genomic data necessary for personalized medicine, using interdisciplinary approaches that combine statistics, computer science, molecular biology, and genomics. While this particular course will focus mostly on statistical methods for designing and analyzing molecular studies, those who take it will come from a wide variety of disciplines. The instructional design will be one of active experimental learning: the course will include in-class lectures, group discussion and brainstorming, homework, simulations, and collaborative projects on real and realistic problems in human health tied directly to the student’s own professional interests. Review of some multivariate methods, including statistical learning and inference methods when the number of measures far exceeds the number of subjects (“high-dimensional data”). Topics include (but not limited to) designing high-throughput studies, sample size and power analysis, low-level preprocessing of microarrays, basic exploratory genomics and proteomics data analyses, classification and supervised learning, cluster analysis and unsupervised learning methods. These statistical methods will be applied to gene and protein expression data, and next generation sequencing data. This course stresses how the core statistical principles, computing tools, and visualization strategies are used to address complex scientific aims powerfully and efficiently, and to communicate those findings effectively to researchers who may have little or no experience in these methods. Basic knowledge in biology will be helpful; however, required molecular biology will be reviewed.

Jump to top

IV. Tentative Schedule

Week Topics
1-7 Introduction to the molecular biology, genomics, and proteomics
Review of statistical (supervised and unsupervised) learning methods
Multiple comparisons (p>>n problems)
Design of high-throughput experiment
Low-level processing (normalization, background correction, etc) of microarray data
Feature selection methods: random forests, support vector machines
8-10 Introduction to proteomics and Mass Spectrometry
Preprocessing of Mass Spectrometry Data
Statistical Analysis of protein expression data
11-15 Introduction to next generation sequencing data
DNA-seq, RNA-seq, ChIP-seq data analysis

Jump to top

VI. Sample Lecture


Videos removed on print.
Jump to top

Materials


PDF preview removed on print.
Jump to top