Talks focused on translating recent advances in biostatistics into practice. Biostatistics Continuing Education

We organize several short courses per year, in which we invite an expert in an area of statistical methodology to teach an applied short course to the statistical community. These are intended to be directly applicable to the work of the statistician in the clinical and translational arena. The theory is explained and presented and applications are also highlighted, together with software implementations.

Absolute Risk: Methods and Applications in Clinical Management and Public Health

April 23, 2018
8:30am 5:30pm
Minot Room, Countway Library

This course is an introduction to absolute risk, the probability of developing a specific outcome, over a specified time interval, in the presence of competing causes of mortality. This course will define absolute risk and discusses methodological issues relevant to the development and evaluation of absolute risk models. We will present the cause-specific and cumulative incidence approaches to incorporating covariates, and discuss various study designs and data for model building, including cohort, nested case-control, and case-control data combined with registry data. We will show how to evaluate the performance of risk prediction models and discuss the use of absolute risk in individual counseling for prevention strategies, including interventions that can have adverse effects. We also discuss the potential use of such models for disease prevention in the population, including designing prevention trials, estimating the absolute risk reduction in the population from modifying risk factor distributions, the "high risk" preventive intervention strategy, risk-based disease screening, and resource allocation.

Ruth Pfeiffer, PhD
Senior Investigator/Biostatistics Branch
Division of Cancer Epidemiology and Genetics
National Cancer Institute
Graduate of Technical University of Vienna, Austria (MA in applied mathematics) and University of Maryland, College Park (PhD in mathematical statistics)

Mitchell H. Gail, MD, PhD
Senior Investigator
Division of Cancer Epidemiology and Genetics
National Cancer Institute
Graduate of Harvard Medical School (MD) and
George Washington University (PhD in statistics)

Registration is required. Please email us to register.

Past Biostatistics Short Courses

Machine Learning and Bayesian Approaches for Data Science in Medicine

Read more for details.

January 18, 2018
11:00am - 5:30pm
Kresge G1
Harvard T.H. Chan School of Public Health

The quantity and scope of data available for translational research are rapidly expanding, which provides both opportunities and challenges for researchers. In this course, statisticians Sherri Rose, PhD, and Laura Hatfield, PhD, will provide an overview of modern analytical methods for applied research in clinical and health policy topics. The course will begin with a broad introduction to posing research questions, evaluating data sources, and specifying and assessing causal inference assumptions. The rest of the course will focus on choosing methods that are best suited to particular research questions, with emphasis on the "why," "what," and "how" of machine learning and Bayesian estimation techniques and a brief overview of available software.

Sherri Rose, PhD
Associate Professor, Harvard Medical School

Laura Hatfield, PhD
Associate Professor, Harvard Medical School

Slides from Drs. Rose and Hatfield's presentation. [PDF]

Latent Variable Modeling and Measurement
October 27, 28, 2016

Read more for details.

Karen Bandeen-Roche, PhD, and Qian-Li Xue, PhD, Johns Hopkins University
October 27, 2016, 1:30pm-5:30pm, Room 1620, Dana Building
October 28, 2016, 8:30am-12:00pm, 506 Minot Room, Countway Library

Please join Karen Bandeen-Roche, Hurley-Dorrier Professor and Chair, Department of Biostatistics at Johns Hopkins Bloomberg School of Public Health, and Qian-Li Xue, Associate Professor of Medicine and Biostatistics at Johns Hopkins Medical Institutions. This short course will discuss latent variable modeling and applications to construct development, measurement, and validation.


Course Materials

Day 1

Day 2

(Login required. Please log in then click the video link again. Login help.)


This short course introduces latent variable modeling with a primary focus on quantitative approaches to complex measurement and a secondary focus on regression ("structural" modeling). Following an overarching introduction to latent variable modeling, topics include the principles of psychometrics, including reliability and validity; the statistical basis for latent variable analysis, including exploratory and confirmatory factor analysis and latent class analysis; exemplification of structural modeling via latent class regression, latent growth curve, and mixtures-of-growth-curves analysis. Model specification, fitting, identification, diagnosis, and pitfalls are discussed. Substantial time will be devoted to software introduction (SAS, Mplus) and application to examples in the clinical sciences (e.g. physical and cognitive impairment and disability). Upon successfully completing this course, students will be able to read and evaluate scientific articles as regards measurement in health; and interpret, and begin fitting, latent variable models, including factor analyses, latent class analyses, and latent class regression analyses.

A Comprehensive Tour of Modern Clinical Trials with Software
April 4, 11, 25 & May 2, 2016

Read more for details.

(Login required. Please log in then click the video link again. Login help.)

Cyrus Mehta, PhD; Charles Liu, PhD; Yannis Jemiai, PhD; Scott Evans, PhD
April 4, 11, 25, & May 2, 2016
April 4: 8:00am-12:00pm, DFCI (450 Brookline Ave.), SM 308/309
April 11: 8:00am-12:00pm, Kresge G1/Snyder Auditorium (677 Huntington Avenue)
April 25: 8:00am-12:00pm, Kresge G1/Snyder Auditorium (677 Huntington Avenue)
May 2: 8:00am-12:00pm, Kresge G1/Snyder Auditorium (677 Huntington Avenue)

This four-part workshop will introduce participants to modern methods for designing Phase I, II, and III clinical trials. The lecture material will be followed by hands-on exercises with the East® software to consolidate understanding of the material. The focus is on understanding the concepts rather than the technical and mathematical details. Participants are required to bring laptops to Sessions 2-4 and will receive an East® license through 2016.

April 4, Session 1 (8:30am-11:00am): Clinical Trials: Today and Tomorrow
Lecture: We will discuss a few challenging issues in clinical trials today and a few issues that will be of increased importance in the future. Topics include noninferiority, pragmatism, and benefit:risk evaluation using the desirability of outcome ranking (DOOR).

Slides from Dr. Scott Evans's presentation [PDF] and [PDF]

April 11, Session 2 (8:00am-noon): Dose Escalation for Phase I Oncology Trials
Lecture: Limitations of 3+3 method; Introduction to Bayesian model-based methods such as Continual Reassessment Method (CRM), modified Toxicity Probability Interval (mTPI) method, Bayesian Logistic Regression Model (BLRM), and Product of Independent beta Probabilities Escalation (PIPE) method.
Software Exercises: Evaluate operating characteristics of various methods under different dose-toxicity assumptions to inform the best choice of design parameters.

Slides from Dr. Charles Lui's presentation [PDF]

April 25, Session 3 (8:00am-noon): Group Sequential Designs for Phase III Trials
Lecture: Early efficacy stopping; alpha spending; futility boundaries; p-values and confidence intervals adjusted for early stopping; extensions to multiple treatment arms.
Software Exercises: Design and monitoring of the CAPTURE trial for percutaneous coronary intervention

Slides from Dr. Cyrus Mehta's presentation [PDF]

May 2, Session 4 (8:00am-noon): Adaptive Designs for Phase II and Phase III Trials
Lecture: Sample size re-estimation in group sequential designs; integrated Phase II/III designs with dose selection and sample size re-estimation at the end of Phase I.
Software Exercises: Design of the Valor trial for acute myeloid leukemia; design of the Advent trial for HIV induced diarrhea.

Slides from Dr. Cyrus Mehta's presentation [PDF]

Methods of Analysis of Genetic Studies
December 10-11, 2015

Read more for details.

Liming Liang, PhD
December 10-11, 2015
8:00am-noon, Kresge G-3 and G-1

This course will introduce Concept and Theory, Methods and Software Tools needed to critically evaluate and conduct genetic association studies in unrelated individuals and family samples, including: multiple comparisons issues, population stratification, genome-wide association studies, genotype imputation, gene-gene and gene-environment interaction, analysis of microarray data (including gene expression, methylation data analysis, eQTL mapping). Useful software tools will be introduced during the lectures.

(Login required. Please log in then click the video link again. Login help.)

Introduction to Meta-Analysis and Systematic Review Methods
October 19, 2015

Read more for details.

Michael Stoto, PhD
October 19, 2015
8:30am-4:30pm, HMS Countway Library, Minot Room

The goal of this one-day short course is to provide an introduction to the methods for systematic review of the literature and meta-analysis of the results that form the basis for evidence-based medicine and practice. We will cover the four basic steps in conducting a systematic review: (1) clearly specifying the clinical or policy question(s) to be addressed; (2) systematically searching the literature to identify relevant studies to include in the review, extracting the results, and evaluating the quality of the available studies; (3) applying formal meta-analysis methods to statistically summarize and synthesize the results, including assessing and studying heterogeneity and meta-regression; and (4) interpreting and translating the results into evidence-based policy and practice recommendations. The approach and statistical methods will be illustrated through clinical and public health examples, including in-depth analyses of drug safety issues and comparative effectiveness research.

(Login required. Please log in then click the video link again. Login help.)

Statistical Analysis of Missing Data in Observational Studies: Methods and Applications
April 23, 2015

Read more for details.

Nicholas Horton, ScD
April 23, 2015
2:30pm-5:30pm,Harvard T. H. Chan School of Public Health, Kresge G3

Missing data arise in most real-world situations, and can cause bias or lead to inefficient analyses. The development of statistical methods to address missingness has been actively pursued in recent years, and sophisticated software to appropriately account for it is available within general purpose statistics packages. This session will emphasize practical skills. It will discuss the nomenclature for missing data methods, appropriate ways to describe patterns of missing data as well as how to account for incomplete observations using multiple imputation. The methods will be illustrated using biomedical examples.

Applied Longitudinal Analysis
March 20, 2015

Read more for details.

Garrett Fitzmaurice, ScD
March 20, 2015
8:30am-4:30pm, Harvard T.H. Chan School of Public Health, FXB G12

The goal of this one-day short course is to provide a broad introduction to statistical methods for analyzing longitudinal data. The main emphasis is on the practical rather than the theoretical aspects of longitudinal analysis. The course begins with a review of established methods for longitudinal data analysis when the response of interest is continuous. A general introduction to linear mixed effects models for continuous responses is presented. Next, we discuss how smoothing and semiparametric regression allow greater flexibility for the form of the relationship between the mean response and covariates. We demonstrate how the mixed model representation of penalized splines makes this extension straightforward. When the response of interest is categorical (e.g., binary or count data), two main extensions of generalized linear models to longitudinal data have been proposed: "marginal models" and "generalized linear mixed models." While both classes of models account for the within-subject correlation among the repeated measures, they differ in approach. In this course we highlight the main distinctions between these two types of models and discuss the types of scientific questions addressed by each.
Prerequisite Knowledge: Attendees should have a strong background in linear regression and some minimal exposure to generalized linear models (e.g., logistic regression).

(Login required. Please log in then click the video link again. Login help.)

Causal Mediation Analysis
March 3, 2015

Read more for details.

Tyler VanderWeele, PhD
March 3, 2015
8:30am-4:30pm, Countway Library, Minot Room

The workshop will cover some of the recent developments in causal mediation analysis and provide practical tools to implement these techniques. Mediation analysis concerns assessing the mechanisms and pathways by which causal effects operate. The course will cover the relationship between traditional methods for mediation in epidemiology and the social sciences and those that have been developing within the causal inference literature. For dichotomous, continuous, and time-to-event outcomes, discussion will be given as to when the standard approaches to mediation analysis are valid. Using ideas from causal inference and natural direct and indirect effects, alternative mediation analysis techniques will be described when the standard approaches will not work. The no-confounding assumptions needed for these techniques will be described. SAS, SPSS, Stata and R macros to implement these techniques will be covered and distributed to course participants. The use and implementation of sensitivity analysis techniques to assess the how sensitive conclusions are to violations of assumptions will be covered. Discussion will be given to how such mediation analysis approaches can be extended to settings in which data come from a case-control study design. The methods will be illustrated by various applications to perinatal, genetic, and social epidemiology. Familiarity with linear and logistic regression will be assumed; some knowledge of counterfactual notation would be helpful but is not necessary.

(Login required. Please log in then click the video link again. Login help.)

Modeling Ordinal Categorical Data
December 3-4, 2014

Read more for details.

Alan Agresti, PhD
December 3 & 4, 2014
HMS, Countway Library, Minot and Ballard Rooms

This short course surveys methods for modeling categorical response variables that have a natural ordering of the categories. Such data often occur in the social sciences (e.g., for measuring attitudes and opinions) and in medical and public health disciplines (e.g., pain, quality of life, severity of a condition). Topics to be covered include logistic regression models using cumulative logits with proportional odds structure, non-proportional odds models, other ordinal logistic regression models such as using adjacent-categories logits, other multinomial response models such as the cumulative probit, and marginal models and random effects models for clustered, correlated ordinal responses (e.g., repeated measurement data). Examples presented include social survey data and randomized clinical trials. Software focus is on R, but SAS output is also provided for many examples. The course will be a concise summary of parts of the book, "Analysis of Ordinal Categorical Data" by Alan Agresti (2nd ed., Wiley, 2010).

Return to top

Tutorial: Introduction to the Design of Cluster Randomization Trials
March 6, 2013

Read more for details.

(Co-sponsored with the Dept. of Global Health and Population)

Instructor: Allan Donner, Ph.D., FRSC
Professor, Department of Epidemiology and Biostatistics, Schulich School of Medicine and Dentistry, The University of Western Ontario; Director, Biometrics, Robarts Research Institute
Wednesday, March 6, 2013
10:30am - 12:30pm
Harvard T.H. Chan School of Public Health, Kresge Room 212

Return to top

An Introduction to Interaction Analysis
December 12, 2012

Read more for details.

Tyler VanderWeele, PhD, Associate Professor of Epidemiology
Department of Epidemiology, Department of Biostatistics, Harvard T.H. Chan School of Public Health

Wednesday, December 12, 2012, 3:30-5:00pm
Reception 5:00-5:30pm
Harvard T.H. Chan School of Public Health, Kresge G1

This tutorial will provide a relatively broad introduction to the topic of interaction between exposures. We discuss interaction on both additive and multiplicative scales using risks, and we discuss their relation to statistical models (e.g. linear, log-linear, and logistic models). We discuss and evaluate arguments that have been made for using additive or multiplicative scales to assess interaction. We describe inferential procedures for interaction when logistic models are fit to data but when additive and not just multiplicative measures of interaction are desired. We discuss issues of confounding for interaction analyses and how whether control has been made for only one or both of two exposures affects whether interaction estimates can be interpreted as causal interaction between the two exposures or only as effect heterogeneity. We further discuss conditions under which interaction gives evidence of synergism within the sufficient cause framework and the relevance of this in assessing gene-gene and gene-environment interactions.

Slides from Dr. VanderWeele's presentation.

Propensity and Stratification Scores: Background and Application
November 7, 2012

Read more for details.

Jacqueline R. Starr, PhD, The Forsyth Institute

November 7, 2012, 3:30-5:00pm
Reception 5:00-5:30pm
Harvard T.H. Chan School of Public Health, FXB G12

In prospective epidemiologic studies one can alleviate potential confounding by adjusting for exposure (or treatment) probabilities through the propensity score, a confounder summary score. The propensity score represents exposure probability conditional upon covariate values. It is a balancing score and ensures that within propensity score strata, subjects in the two exposure groups will be, on average, similar in their covariate values. Propensity scores are applied to case-control studies but do not necessarily share the same properties as in cohort studies. Stratification scores are similar to propensity scores and can be applied when the number of case and control subjects is fixed. The stratification score is a retrospective balancing score estimated as the probability of the outcome conditional upon covariate values, and the application is similar to that of propensity scores. Very little is known about the properties or performance of stratification scores. This presentation will cover the background and application of the propensity score approach, including principles for constructing and applying propensity scores. My interest in confounder summary scores arose from analyzing relatively small datasets regarding rare congenital anomalies. I will describe some examples and the motivation for applying stratification scores. I will also discuss some of our proposed research directions regarding stratification scores.

More flexible linear mixed effects models for longitudinal data analysis
January 19, 2011

Read more for details.

Garrett FitzmauriceGarrett Fitzmaurice, ScD
Professor in the Department of Biostatistics
Harvard T.H. Chan School of Public Health

Wednesday, January 19th, 2011, 3:30-4:45 PM
Massachusetts General Hospital
Yawkey 7-980

Linear mixed effects models have become established and enduring methods for longitudinal analyses. However, linear mixed effects models have an important potential limitation: they assume that the shape of the functional relationship between the mean of the longitudinal response and the covariates is known. In this talk we briefly review linear mixed effects models and then discuss a simple extension that allows greater flexibility for the form of the relationship. Specifically, we review the connection between penalized splines and linear mixed effects models and show how a mixed effects model representation of penalized splines makes their extension to the longitudinal setting relatively straightforward.

The main ideas are illustrated using longitudinal data on progesterone metabolite concentration from a study of early pregnancy loss.

How many participants? How many measurements? The design of longitudinal studies
February 23, 2010

Read more for details.

Donna SpiegelmanDonna Spiegelman, ScD
Professor of Epidemiologic Methods
Harvard T.H. Chan School of Public Health

Tuesday, February 23, 2010, 3:00-4:30pm
Yawkey Room 10-660
Massachusetts General Hospital

Longitudinal studies follow N participants, and data on variables of interest are collected r more times after baseline for each participant. In some studies, the number of participants is fixed and the investigator needs to determine the minimum number of additional measurements subject to a pre-specified power constraint. In other studies, the number of times measurements are taken is fixed and the investigator needs to determine how many participants are needed to attain a fixed power. And in some studies, both N and r are free, and the investigator may choose the combination that minimizes study cost for a fixed power, or that maximizes power for a fixed cost. In a longitudinal study, the investigator must specify features of the correlation matrix that describe the relationship between repeated measures from the same person in addition to the usual design inputs. Methods previously developed in the context of clinical trials are extended to allow for exposure prevalence to vary, allowing for primary time metrics other than duration of follow-up, for time-varying exposures, and to allow for correlation between the primary time metric and exposure. Software is available to implement these methods.

Slides to Dr. Spiegelman's presentation.

Musings about missing data: Challenges for the analysis of observational and randomized studies
December 15, 2009

Read more for details.

Nicholas HortonNicholas Horton, ScD
Associate Professor, Department of Mathematics & Statistics
Smith College

Tuesday, December 15, 2009, 3:00–4:30pm
Ledge Room 4-002B, One Brigham Circle
Brigham and Women's Hospital

Missing data arise in almost all real-world situations and can cause bias or lead to inefficient analyses. The development of statistical methods to address missingness has been actively pursued in recent years. This talk will (1) address complications in observational studies when there are many patterns of missing values for categorical and continuous predictors, (2) discuss issues in implementing analyses that are consistent with the intention to treat principle in randomized trials, and (3) demonstrate how these methods can be implemented through detailed discussion of examples.

Slides from Dr. Horton's presentation.

Translating research to practice: An introduction to causal inference, with extensions to longitudinal data
November 18, 2009

Read more for details.

Tyler VanderWeeleTyler VanderWeele, PhD
Associate Professor of Epidemiology
Harvard T.H. Chan School of Public Health

Wednesday, November 18, 2009, 3:00–4:30pm
Trustman Boardroom
East Campus, Feldberg / Reisman Complex - 2nd Floor
Beth Israel Deaconess Medical Center

The first talk in the series, to be presented by Tyler VanderWeele, PhD, will discuss causal inference in the context of longitudinal data. The lecture will give a brief overview of how the "counterfactual" or "potential outcomes" framework can be useful in distinguishing association from causation. Issues concerning time-dependent confounding that can arise in longitudinal data will be discussed, and an introduction to causal methods to handle time-dependent confounding will be given. The ideas will be illustrated by a detailed discussion of an example using longitudinal data to distinguish the relative persistence of the effect of loneliness on depression versus on subjective well-being.

Slides from Dr. VanderWeele's presentation.

Return to top