Biostatistics Journal Club: Missing Data Challenges in Electronic Health Records-Based Studies of COVID and Long COVID
Since the beginning of the COVID-19 pandemic, researchers have repeatedly turned to electronic health records (EHR) to rapidly answer complex questions about the short and long-term consequences of SARS-CoV-2 infection. However, because EHR are not collected for research purposes, observational studies using EHR are subject to various challenges and biases, including bias due to missing data. Standard missing data methods generally fail to address the complex nature of EHR data, particularly the interplay of numerous decisions by patients, physicians, and insurers that collectively determine whether “complete” data is observed. Tanayott Thaweethai, PhD, Massachusetts General Hospital, will discuss some statistical methods for handling bias due to missing data in the EHR setting, and conclude with an introduction to a semi-supervised learning technique for handling the “positive unlabeled” problem of phenotyping individuals based on the presence or absence of clinical codes.
Tanayott Thaweethai, PhD, is an instructor in investigation and the associate director of biostatistics research and engagement at Massachusetts General Hospital Biostatistics. He is also an instructor in medicine at Harvard Medical School. He is working on developing methods to improve the handling of missing data when conducting large observational studies using electronic health records. His research collaboration areas at Mass General include diabetes in pregnancy, clinical effectiveness of type 2 diabetes treatment, and several studies related to COVID-19. He is also lead biostatistician at the Data Resource Core for RECOVER, an NIH research initiative that seeks to understand post-acute sequelae of SARS-CoV-2 (PASC) and long COVID. He received his PhD in biostatistics from Harvard University.