Growing up in the Bible Belt of Oklahoma, Melissa Dell, PhD, learned early on that “this is the way the world is and you don’t question that.” When her lust for more led her to apply to Harvard, her father balked, telling her it “wasn’t for people like us.” She applied anyway, and got in. Good financial aid allowed her to accept. Today, the Andrew E. Furer Professor of Economics at Harvard University and mother of three children under 4 is consistently ranked as one of the world’s most influential young economists. In 2020 for example, she received the prestigious John Bates Clark Medal, awarded each year to an American economist under the age of 40 who is judged to have made the most significant contribution to economic thought and knowledge.
That same year, Dell received pilot funding from our Sight & Science: Vision 2020 research program for an out-of-the-box initiative that uses deep learning to digitize historical documents. It culminated with the release last year of Layout Parser, an open-source document-analysis package. The work supports Dell’s own research to understand how political policies and conflicts affect economic conditions of society over the long run and has manifold other applications across disciplines.
You are a world-renowned economist who has used Sight & Science pilot funding to digitize historical documents. What led you to this?
Our driving motivation is to unlock information for learning and progress. All this information important for academic research, and more broadly for understanding society and historical events, is trapped in hard copy. Existing technology is not able to unlock it, particularly if it’s more than 20 years old or originates outside the U.S. The vision pilot grant enabled us to begin to change that.
“This is part of a broader agenda to make information trapped in hard copy more accessible.”
Our application was historical newspapers. There are hundreds of thousands of other applications. The problem extends far beyond newspapers to many different contexts. We want to make the tools accessible to everyone, including academic researchers, companies developing products for the visually impaired, and people with limited computer programming skills.
More recently, we’ve been working on a way to achieve even better accuracy with greater efficiency. This is part of a broader agenda to make information trapped in hard copy more accessible.
You developed an open-source program, Layout Parser, so anybody can download it and apply it to their own project. How does this fit into your own work?
Much of my work focuses on understanding how historical events and policies affect regional economies. Having access to information is central. We have a database of tens of millions of page scans of off-copyright historical newspapers that current Optical Character Reader (OCR) systems can’t read. It’s much too costly to digitize by hand. There are so many questions that people can’t answer because they can’t access the documents.
When we released the Layout Parser library that we developed with the pilot grant, it reached 1.1 million people on Twitter almost immediately. About 55,000 people clicked through to the website. I’m sure people find it useful for all kinds of things that I wouldn’t have imagined. We try to make it super straightforward to do that, even if you don’t have a lot of computer-programming knowledge.
As an economist, deep learning and document analysis have not been your primary areas of expertise. How did pilot funding help solve this need?
“We heard: “You’re not a machine-learning researcher; why are you doing this?” The pilot funding gave us a chance to show that we can develop these kinds of methods and that there’s demand for them.”
These resources through Harvard Catalyst were enormously useful to develop this from a very early stage. It was different than what anybody had done. When we tried to apply more broadly for funding, we heard: “You’re not a machine-learning researcher; why are you doing this?” The pilot funding gave us a chance to show that we can develop these kinds of methods and that there’s demand for them.
We very much hope this will lead to other grants and we’re working on that. The goal is to fund a broader agenda to efficiently extract information with deep-learning methods, so you don’t have to have giant supercomputers to do your own study. That is how to democratize the use of these tools.
I’m not Google. I’m an academic researcher with limited resources at my disposal. That’s true in general across academic research and the kind of startup companies that are developing accessibility tools for the visually impaired.
You have been visually impaired all your life. How has that influenced this work?
I think it made me particularly aware of how poor a lot of the products are. It made me tuned in from a young age about how much information is out there that is not accessible.
In recent years, far more information has become natively available in digital format, but there’s still a huge amount trapped in hard copy. That can be challenging for students with visual impairments who can’t access it. In fact, it can be challenging for anyone who doesn’t have the resources to unlock that information.
Growing up in Oklahoma, you read about Harvard at a public library and decided to apply despite your father’s warning that it wasn’t “for people like us.” How did going to Harvard College change your trajectory?
I just loved it from the second I arrived. I mean, you have this course catalog, and you can take courses about pretty much any topic you could imagine. It’s just mind-blowing. It felt like a community where people had different perspectives on a lot of things, and that was okay. I was like: “Wow, you see things in a totally different way than I was seeing them. That’s super interesting. Let’s talk about this.” Not everybody’s like that, but that was very much the ethos, which was not really how things were in Oklahoma. That kind of openness to engage in debate and think critically about things was really amazing.