Blinded by the cite

The ideas that shape fields the most aren’t always the most widely cited. (©

Blinded by the cite

A new model reveals forgotten influencers and “sleeping beauties” of science

For centuries, scientists and scholars have measured the influence of individuals and discoveries through citations, a crude statistic subject to biases, politics, and other distortions. A new paper led by the University of Chicago’s Knowledge Lab describes a different way to keep score in science—a more direct measure of how ideas ripple out across scholarship and culture.

The authors’ new computational model throws the spotlight onto work that changed the path of science but has remained underappreciated. The approach can be adapted to trace influence in other areas, such as literature or music, the authors say in the paper, published March 12 in Proceedings of the National Academy of Sciences.

“We’re measuring how much scientists’ and scholars’ writings influence discussion of ideas in the future,” says James Evans, director of the Knowledge Lab and professor of sociology at UChicago. “Influence is a politicized process; those who get the influence get the credit, and those who get the credit get the capital to do the next big thing. This is the first time we have a tightened ability to identify influence, and also to diagnose social and strategic influences on citing behavior.”

In theory, references in an academic paper enable authors to credit their predecessors, the researchers and work upon which they built their new discovery. But in practice, citations are chosen for many reasons—authors are more likely to cite themselves, powerful colleagues in their field, and researchers at prestigious institutions, and are often biased toward citing more recent or already highly cited articles.

Despite these imperfections, many computational studies of scientific influence have relied on the citation record as a useful proxy. The new study, led by former Knowledge Lab postdoctoral researcher Aaron Gerow, takes a novel, deeper approach, using both the full text of articles and external information such as author identity, affiliation, and journal reputation.

Employing a computational method known as topic modeling—invented by coauthor David Blei of Columbia University—the model tracks “discursive influence,” or recurring words and phrases through historical texts that measure how scholars actually talk about a field, instead of just their attributions. To determine a given paper’s influence, the method allows researchers to imagine how science would have proceeded without it.

“We can not only find out how topics changed over time but can actually simulate the future without a given document from the past and look at how discourse moving forward was different with and without a given document,” says Gerow, now an assistant professor at Goldsmiths, University of London. “Citations are one kind of impact, and discursive influence is a different kind. Neither one is the complete story, but they work together to give a better picture of what’s influencing science.”

The authors trained the model on JSTOR, a massive database of academic publications, which allowed them to quantify various biases and discern distinct patterns of influence. Scientists who persistently published in a single field were more likely to be “canonized” in a way that compelled others to cite them disproportionate to their papers’ discursive contributions. On the other hand, discoveries that crossed disciplinary boundaries tended to have outsized discursive impact but fewer citations, likely because the “owner” of the idea and her allies remain socially and institutionally distant from the citing author.

One interesting subcategory of paper the model detected is known as “sleeping beauties,” or papers that went relatively unacknowledged for years or even decades before experiencing a late burst of citations. For example, a 1947 paper on graphene remained obscure and forgotten until there was a resurgence of research interest in the ultrathin carbon material in the 1990s and a Nobel Prize for two University of Manchester researchers in 2010.

“Papers have a news cycle, when lots of people chat about them and cite them, and then they’re no longer new news,” Evans says. “Our model shows that some papers have much more influence than citations will typically demonstrate, such as these ‘sleeping beauties,’ which didn’t have much influence early but come to be appreciated and important later.”

The same model can also be used to measure influence in other areas, the authors said. Text from poems or song lyrics, and even extratextual characteristics such as stanza structure or chord progressions, could feed into the model to find underrecognized influencers and map the spread of new concepts and innovations.

“Though we developed and validated this model on scientific text, now we can use it for anything and everything, especially cases where there are no traces of influence but patterns in the content itself,” Evans says. “It’s like trending on Twitter, but where everything is Twitter. That is what’s most exciting to me.”