Skip to main content

Calendar

Biostatistics short course: Data Science for Statisticians: Visualization, Data Wrangling, and an Introduction to Machine Learning – March 19

Tuesday, March 19, 2019
1:00 pm – 5:00 pm
Harvard T. H. Chan School of Public Health
Kresge G2.

Biostatistics short course: Data Science for Statisticians: Visualization, Data Wrangling, and an Introduction to Machine Learning – 3/19/2019

Led by Rafael Irizarry, PhD, professor of biostatistics, Harvard  T. H. Chan School of Public Health, and chair of the department of biostatistics and computational biology, Dana-Farber Cancer Institute, this course features case studies from world health and economics and demographic registry data to demonstrate how to use modern statistical packages such as ggplot2 and dplyr to visualize and wrangle data. The basics of machine learning and instructions on using the caret package to make predictions will be discussed.

Participants will need to have basic knowledge of R and a laptop with R and RStudio installed.

See further prerequistis and register here.

Abstract
Using case studies from world health and economics, demographic registry data from Puerto Rico, and hand-written digits, we will demonstrate how to use modern statistical packages such as ggplot2 and dplyr to visualize and wrangle data. The data visualization part will include a session on principles. The data wrangling part will be particularly useful to statisticians wanting to cut their (expensive) dependence on SAS. We will then introduce the basics of machine learning and how to use the caret package to make predictions.

Prerequisites
Basic knowledge of R. For example, you should know how to define a numeric vector, how to access the elements of data frame, and how to write a function.

A Wi-fi enabled laptop with R and RStudio installed.

The tidyverse, dslabs, and caret package installed. The four expressions below should be TRUE if you run them in R.

as.numeric(version$major)>=3 & as.numeric(version$minor) >=5
packageVersion(“tidyverse”) >= “1.2.1”
packageVersion(“dslabs”) >= “0.5.1”
packageVersion(“caret”) >= “6.0.80”

Rafael A. Irizarry, PhD
Rafael Irizarry is a professor of applied statistics at Harvard T.H. Chan School of Public Health and the Dana-Farber Cancer Institute. He was recently named chair of the Department of Biostatistics and Computational Biology at the Dana-Farber Cancer Institute and is a professor of Biostatistics at Harvard T.H. Chan School of Public Health.

Sign up to receive our newsletter: courses, funding, events, and resources.