Biostatistics journal club: Predictability of Life Outcomes and Machine Learning Prediction in Social Sciences
Wednesday, April 28, 2021, 1:00 – 2:00pm
Presenter: Boyu Ren, PhD, instructor in psychiatry (biostatistics), Harvard Medical School; assistant biostatistician, McLean Hospital
Predictive algorithms developed using modern statistical and machine learning techniques have been proved to be highly effective in many tasks, from recognizing spoken language to detecting credit card fraud to driving a car automatically, and are believed to be the key to accurate real-time decision-making in many aspects of people’s lives. Most of the existing research on predictive models focus on the algorithms: how to best approximate the underlying relationship between an outcome of interest and a large collection of features, but little effort has been directed to the practicality of the prediction task itself. In this article, the authors aim to answer such question in the context of sociological research through the common task method. By evaluating the collection of predictive models generated by large, diverse groups of researchers based on the same exact data, the authors identify limits of the predictability of life trajectories of children from fragile families, as suggested by the universally high prediction errors across models and the minimal difference of prediction accuracy of highly flexible machine learning algorithms to simple benchmark models. This issue of predictability is also relevant for biomedical research, where the evolution of biological/medical outcomes is governed by complex underlying processes. We will discuss whether the framework proposed here can serve as a generalizable solution to predictability estimation, and from statistical point of view, the potential factors that might contribute to low estimated predictability.
Measuring the Predictability of Life Outcomes with a Scientific Mass Collaboration. Matthew J. Salganik, Ian Lundberg, Alexander T. Kindel, et al. Proceedings of the National Academy of Sciences, Apr 2020, 117 (15) 8398-8403; DOI: 10.1073/pnas.1915006117