From investigator to innovator: Pulitzer winner pursues new datajournalism tools

Sarah Cohen

The Washington Post's Top Secret America series and The New York Times's, The Guardian's and Der Spiegel's Afghanistan war logs reports were remarkable for a number of reasons. Among them, Pulitzer-winning investigative reporter Sarah Cohen said, is that they're perhaps the first data-driven stories of such magnitude to be produced for a participatory Web audience first, and for a print audience second, instead of the other way around.

But the stories were stitched together in large part with 20th century technology, said Cohen, a former Post database editor. It took her old employer two years to build the database Top Secret America was based on. "It shouldn't take that long," Cohen said. The Guardian, meanwhile, said it received the 92,000 leaked documents from Wikileaks in a single Excel file, about 30,000 more than the ubiquitous spreadsheet program can handle, according to the newspaper. "If you're dealing with 92,000 documents," Cohen said, "the last place you want to be looking at them in is Excel."

It's not that practitioners are stupid or incompetent, Cohen hastened to say, it's that better tools aren't readily accessible. Changing that is Cohen's main charge as a Knight journalism professor at Duke University, a position she's held for a little over a year. Working out of the public policy school's DeWitt Wallace Center for Media and Democracy, Cohen searches for ways technology can make investigative reporting easier and less expensive.

"We know there are going to be fewer boots on the ground, so let's see how we can make those boots more effective," Cohen said in a telephone interview with Journalism Lives last Thursday, the day the first tool born from her work at Duke, open-source desktop program TimeFlow, was released.

TimeFlow, developed by former IBM Many Eyes contributors Fernanda Viégas and Martin Wattenberg, sorts and visualizes temporal data, helping journalists spot trends and relationships and lessening the time they spend rereading notes. Unlike similar tools, Cohen said, TimeFlow is "designed to work the way reporters work, not the way you want it to look in the end." For example, reporters can input ambiguous dates and change them to more specific ones later without confusing the software.

An alpha version of TimeFlow can be downloaded here.

Cohen, who remains on contract with the Post, said she intends to use TimeFlow and other tools she creates in her own reporting so that she can make the most of developers' time.

"If they don't work, there's no point in getting someone to make them easy," she said.

As she pursues future tools, Cohen is keeping an eye on disciplines facing challenges similar to the news industry's, such as education, government and medicine. The digitization of doctors' hand-written notes and the automatic transcription of government meetings are among the many efforts in other fields with obvious journalistic applications, Cohen said.

"I wanted to see if there were people dealing with the same things that we're dealing with more effectively and more efficiently," she said.

This is something we're interested in, too. While news organizations' reporters talk with experts from other industries every day, their superiors venture out of the publishing or broadcasting bubble much less frequently. This is a shame, especially when the social Web makes it so easy.

If there's one thing David and I learned working alongside artists, educators, entrepreneurs and other non-journalists in our graduate program, it's that professionals in disparate fields have a lot to learn from each other, especially when the rapid pace of technological change means experts are at once nowhere and potentially anywhere.

What do you think? What fields should journalists consult in order to improve theirs?