Data match

Incubated in a Data Science for Social Good Fellowship, the Legislative Influence Detector aims to increase government transparency. 


A Wisconsin bill banning all nonemergency abortions at 20 weeks, signed into law last July, received national media attention. UChicago data scientist Joe Walsh estimates about a thousand articles were written about the bill. Few of them, however, mentioned that the Wisconsin legislation was nearly word-for-word identical to the Texas abortion bill that state senator Wendy Davis famously filibustered in 2013—and to 72 other bills introduced in 41 state legislatures in the past few years.

States pass a lot of bills. According to a Washington Post analysis, in 2014 the average state government passed 462 new laws (by comparison, the 113th Congress passed 296 laws in two years). However, state lawmakers “often don’t have the staff and the expertise and the time to write legislation,” says Walsh, so they introduce bills passed in other states or written by advocacy groups. There are also far fewer reporters covering politics in Springfield, Illinois, or Lansing, Michigan, than there are in Washington, DC, and therefore less journalistic oversight of the hundreds of bills passed in state capitols. “That means there are a lot of groups that are able to exercise disproportionate influence in getting legislation that they like passed,” says Walsh. “We wanted to know if we could use data science to identify those groups.”

This past summer, Walsh, project manager Lauren Haynes, and a team of three data science fellows developed a text analysis tool to help track copied legislation across state capitols and uncover lobbyists’ influence. They were one of 12 teams supported by a 2015 Eric and Wendy Schmidt Data Science for Social Good Summer Fellowship. DSSG pairs teams of paid fellows, usually students or recent graduates with an interest in social issues, with mentors like Walsh and a public or private sector partner to find data-driven solutions for a specified problem.

The fellowship is run by the University’s Center for Data Science and Public Policy, a collaboration between Chicago Harris and the Computation Institute. In 2015 teams developed programs and algorithms to help the City of Cincinnati predict which buildings would fail inspections, to anticipate instances of childhood obesity and cardiac arrest at NorthShore University Health System, and to identify students at Montgomery County Public Schools in Maryland at risk of falling behind.

Walsh and his team worked with the Sunlight Foundation, a nonprofit focused on government transparency that had amassed a trove of more than 500,000 state bills and pitched the project to DSSG. The fellows supplemented the Sunlight Foundation data with 2,400 pieces of model legislation collected from five major lobbying groups, including the conservative American Legislative Exchange Council and the liberal State Innovation Exchange. A lot of the model legislation was publicly available on the groups’ websites, says Walsh—“It’s kind of surprising how much stuff is out there.”

Their text analysis tool, the Legislative Influence Detector (LID), uses a local alignment algorithm, akin to algorithms used to identify like strands of DNA, to find similar pieces of legislation. The algorithm is accurate but slow—running all the bills the fellows had collected through the algorithm would have taken thousands of years, says Walsh. So the team added a preliminary step: LID first uses a “bag of words” tool similar to plagiarism detection software to check for the same words, in any order, and find the most probable 100 matches to a specific bill. Then LID uses the local alignment algorithm to find the legislation with the same words in mostly the same order.

Running the Wisconsin abortion bill through LID turned up matches from across the country that were almost identical, differing mainly in small stylistic choices, such as writing out a number instead of using numerals, and the occasional misspelling. (The team then turned to Google and found matching model legislation on the website of Doctors on Fetal Pain, an antiabortion advocacy group.)

Wisconsin Senate Bill 179, signed into law in July 2015, and Louisiana Senate Bill 593 (2012), its top match from the Legislative Influence Detector. (Data Science for Social Good Fellowship)

Walsh is quick to say that copied legislation isn’t inherently suspect—it makes sense to introduce laws that have had positive outcomes in other states, or to standardize legal procedures like child adoption across the country. LID simply flags instances of reused or appropriated legislation to help researchers, journalists, or concerned citizens figure out where state laws are coming from, with the goal of making it more difficult for lobbying groups to introduce legislation unnoticed.

Using state bills from the past five years, the DSSG team has found about 35,000 matches to model legislation from the five lobbying groups studied. Matches for state bills introduced through May are available for download on the DSSG website. Walsh and fellow Matthew Burgess are still working on LID and hope to turn it into a public-facing, real-time resource. Ideally journalists would wake up to a list of all the state bills passed the previous day with a list of matching legislation from either other statehouses or lobbyists, says Walsh, and be able to do searches themselves. By making it harder for outside influencers to avoid detection, he says, it can really contribute to government transparency.

This year more projects have been extended beyond the summer, keeping fellows on at the Center for Data Science and Public Policy to either finish them or take them through to implementation. For DSSG and center director Rayid Ghani, this is an important development. The projects need to be launched in the real world “to have the impact we want to have,” he says. And it helps achieves his ultimate goal—training a new generation of data scientists to use their skills to tackle social issues.

As the chief scientist on president Barack Obama’s 2012 reelection campaign, Ghani saw the power of coupling big data technology with a dedicated organization. If it’s possible to use big data technology to more precisely direct the efforts of thousands of campaign volunteers, he says, then it should be possible to use the same technology to help nonprofits and government agencies work more efficiently and effectively.

Encouraged by student interest and funded by Google chairman Eric Schmidt, who also worked on Obama’s 2012 campaign, and Schmidt’s wife, Wendy, Ghani launched DSSG in early 2013 and began accepting fellow applications that March. By April 1, more than 600 people had applied for 36 positions. Over the past three years, DSSG has added more fellows and sponsored projects from improving police encounters with civilians to helping the US Environmental Protection Agency select hazardous waste sites for inspection.

“One of the things [fellows] often talk about is, ‘This is what I wanted to do, I just didn’t know what it was and how I could do it,’” says Ghani. Now, DSSG provides “the training and the motivation and the network” so the fellows’ work in social good can continue beyond summer.

Join The Discussion

Log in with Disqus to automatically enter your contact information.