Computational biologist John Novembre uses statistics to understand human genetic history.
The Kennewick Man, a 9,000-year-old Paleo‑American skeleton discovered on a bank of the Columbia River in 1996, is one of the earliest and most complete sets of human remains ever found in North America. It has also been the subject of considerable controversy—several Native American tribes have claimed the Kennewick Man as one of their own and have fought to repatriate the skeleton for reburial.
This year a UChicago team led by geneticists John Novembre and Anna Di Rienzo used four distinct genetic analyses to verify an independent study that had found significant similarities between DNA in the skeleton and DNA from local tribe members. The Kennewick Man is “genetically closer to modern Native Americans than to any other population worldwide,” Novembre and the team concluded. The study was used as evidence in a government decision to designate the Kennewick remains as Native American, and the remains are expected to be returned to the tribes in the coming months.
Vast improvements over the past few decades in technology for collecting and analyzing DNA made the Kennewick identification possible. Capitalizing on those as well as leaps in computing power, Novembre, associate professor of genetics, is developing novel statistical tools to discover not just the genetic origins of an individual but the histories of species. His work, recognized with a MacArthur Fellowship in 2015, is sharpening, and in some cases revising, our understanding of evolutionary history, human populations and migration, and heritable diseases.
The first research lab where Novembre worked as a Colorado College biochemistry major studied protein folding, a key step in the process by which amino acids translated from messenger RNA become functional proteins. From that close-up biophysical examination of how amino acid mutations change the structure of the resulting proteins, “I naturally wanted to zoom out and look at bigger time scales” and how such mutations affect populations over long time frames. He now seeks evidence of whole species’ stories, stretching over centuries and millennia, especially that of Homo sapiens.
One strand of Novembre’s research develops ways to visualize genetic data and thereby discover structures underlying human populations, and clues about their growth and movement around the earth. “While humans are all very genetically similar,” he says, “there’s always been some structure to the mating patterns, and using genetics we can gain insight into those patterns.”
For example, in a 2008 study of the DNA sequences, or genotypes, of 1,387 Europeans, Novembre’s team took about half a million common gene variants from each subject’s genome and applied principal component analysis to them.
The statistical technique teases out patterns in “high-dimensional” data sets like this, which contain mind-boggling numbers of variables, by lumping together information that is highly correlated and singling out the combinations of variables that are meaningful or important.
When Novembre and his collaborators performed this analysis on their European genetic data, a plot that emerged using the principal components looked astonishingly familiar: the data traced out a rough but recognizable map of the continent. In one corner, for instance, individuals from Portugal and Spain clustered together, neighboring those from France. Throughout the map, expectations from basic geography hold true.
The striking correlation between genetics and geography “was completely surprising when it first came out,” Novembre says, since “this analysis had no geography fed into it.” Besides, he stresses, genetically speaking, all humans are very closely related to each other compared to other species.
On average, two Europeans’ DNA sequences vary by about one in every 1,000 base pairs, and the base pair that varies does so by only about a few percentage points. So “we’re squeezing extremely weak signals out of the data.” (Some of our most visible traits, such as eye color and skin and hair pigmentation, are “the outliers where natural selection has sped up the process of differentiation,” he says.)
Despite the relative homogeneity of human genomes, the researchers were able to determine 90 percent of the subjects’ birthplaces to within 450 miles just from their genetic data, showing that geography plays a key role in the structure of the European population. The study’s results carry implications for other branches of population genetics too. In efforts to identify genes that contribute to inherited diseases, for instance, they underline the need to take into account a sample’s geographic distribution, to not mistake one DNA pattern for another.
Novembre’s lab is also helping to answer fundamental questions about recombination, the process by which genes from two parents blend together into their offspring’s chromosomes. That blending is uneven, and the logic governing it has been poorly understood. In a 2011 paper, Novembre and several collaborators looked at African American genotype data to identify and count “recombination events”: the precise points along a chromosome where the genes switch from one parent’s to the other’s and back again. Since many African Americans have both West African and European ancestry, Novembre and his team could use the switch points in ancestry on their chromosomes as a clue to where recombinations had occurred.
Besides contributing to our larger understanding of how recombination works, this research has yielded a detailed genetic map that helps researchers learn the origins of inherited diseases in African Americans and identify the genes that play a role.
Novembre believes his MacArthur Fellowship is a testament to the promise of the intersection of statistics, computation, and genetics where his work lies. The fellowship comes with no restrictions, just the foundation’s hope that the $625,000 stipend, disbursed over five years, will be used to further recipients’ creative vision.
Talking to previous fellows about how they used their stipends, Novembre has heard a wide range of advice—including to save it for childcare. “That’s best for your creativity,” some fellows told him. But he plans to fund higher-risk pilot projects that might not attract a conventional grant. For instance, he would like to try to isolate prehistoric human DNA from Neolithic archaeological sites, shedding new light on our deeper genetic past and how we became the humans we are today.
Postscript: John Novembre notes, “Some readers spotted that Slovakia appeared to be out of place on the "genetic map"—the large dot representing the median value for Slovakia had a location in the middle of a cluster of Italian individuals. However, this should not be interpreted as reflecting a special genetic relationship of Slovakian and Italian individuals. Unfortunately, we only had one Slovakian sample in the analyzed dataset, and we suspect that it was a mislabeled Italian or some other form of sample mix-up. In subsequent work, we obtained other Slovakian samples and they appeared genetically in their expected position. Occasional outliers such as these are a challenge of large scale analyses. As another interesting twist, there are five Italian individuals in our original analysis that do not cluster with the remaining ones. We later were able to show these five are likely from the isolated island of Sardinia.”