The Struggle To Save The First Draft of History

by | April 30, 2024

Earlier this year, as rumors spread that The Messenger and Vice would be halting publication, journalists scrambled to save copies of their work out of fears owners could take the sites down. So far, the Vice sites have mostly remained up, but The Messenger appears to have joined the ranks of outlets small and large whose domains have effectively gone dark

When reporting disappears from the internet, it can be a serious blow to journalists needing clips to land their next jobs or freelancing gigs. But it also harms the reading public, as well as other reporters, who miss out on unique sources, analysis, and perspectives that broaden their own understanding and coverage of the past. 

“Local, independent, and alternative news sources are especially at risk of not being preserved, threatening to leave critical exclusions in a record that will favor dominant versions of public history,” warned Sharon Ringel and Angela Woodall, then research fellows at Columbia University’s Tow Center for Digital Journalism, in a 2019 report

Ringel and Woodall pointed to the example of Gawker, which saw its archives disappear from the internet after its 2016 bankruptcy. But the problem isn’t limited to shuttered news sources: Experts have warned for years that even active publications struggle to archive their work and make older articles readily available to readers. 

“What if, because of the mind-boggling complexity of modern digital publishing systems, our first draft of history is dissolving?” asked a 2021 report from the University of Missouri’s Donald W. Reynolds Journalism Institute. “That’s the unfortunate fact of what’s happening right now in newsrooms across the country. Quietly, in the background of the news industry’s public struggles is a nearly invisible but dramatic decline in efforts to preserve our daily news.”

Cash-strapped news outlets often treat content management systems as de facto archival systems, but as they redesign websites and upgrade content management systems, hyperlinks get broken and published stories risk being lost or digitally mangled.

“This is an issue cited by nearly every one of the 24 news organizations we met with or interviewed,” according to the report. “The problems of content loss in this process range from minor errors with small amounts of data are lost, usually parts of metadata, to cases in which key parts of stories or even entire stories are missing.”

Important aspects of how stories were presented—changing online headlines, push notifications, homepage placements, and social media posts—are often treated as entirely ephemeral. 

“If I want to collect this data, now, as a researcher, I can’t find it anywhere,” said Ringel. “There’s no archive of push notifications, or, you know, news alerts that I can find.”

Material published through third-party services—like embedded videos, maps, searchable databases, and live blogs—can also disappear if those platforms shut down or change technical features, or if their bills go unpaid amid cost cuts and staff transitions.  

“All I can tell you is that this problem has not gotten better,” said Neil Mara, one of the authors of the RJI report. “In fact, things just continue to get worse.”

Studies have generally found that archives, especially of online material, can be spotty even at outlets like regional newspapers that historically took care to archive their printed work. That’s partly because of the volume and complexity of online publishing, and a widespread but mistaken belief that everything posted on the internet will persist there forever. 

Another culprit is industry-wide budget cuts, which have sent publications scrambling simply to keep the presses and servers running and led to the rapid elimination of a once-common position: the newsroom librarian. 

Historically, media outlets employed librarians who’d maintain paper collections of published work—including both full editions and clipping collections carefully indexed for future consultation. 

“At the St. Louis Post-Dispatch, which had a 22-person news library staff at its peak in the late 1990s, the newsroom now has only one,” according to the RJI report.

Well-organized newspaper archives let staffers track down previous coverage about, say, a city’s mayor, and even filter for profiles of the politician rather than articles that mentioned them in passing, Mara explained. Newspaper librarians also often worked with outside vendors to make their papers’ archives available through public libraries, university collections, and digital databases—a task that can go neglected as librarians disappear from all but the largest outlets.

“Certainly where the effect been most painful for librarians is the lack of anybody waking up  each day in the newsroom thinking about how to archive all of the newsroom’s output—everything the newsroom publishes,” said Katherine Boss, a librarian for journalism, media, culture, and communication at New York University who’s written about digital news archiving and journalistic research practices.

Librarians historically also played a big role in reporting, helping journalists dig up facts, quotes, and photos from their archives and other reference material. That was a practical necessity in the era of fragile paper files and early databases with arcane interfaces and steep usage fees. It was also an acknowledgement of the importance of archival research—and librarians’ particular expertise.  A 2022 article by Robert Berkman, an assistant professor at The New School and author of several books on research tools, recalls the roles newsroom librarians played in the Boston Globe investigation of Catholic Church child abuse dramatized in the movie Spotlight, as well as Miami Herald reporter Julie K. Brown’s investigation of Jeffrey Epstein. 

“Things have changed a lot, unfortunately,” Berkman said in an interview. “Since probably the early 2000s, newsrooms have either eliminated the library function or boiled it down to maybe one person.”

In other words, just as the complexity and pace of journalism have accelerated, financially challenged media companies have cut the very people tasked with archiving and sifting through published work. And reporters for many outlets—as well as many modern freelance journalists—have never had the benefit of in-house librarians, meaning they must find their own tools for archival research and make sure to back up their own clips as best they can.

“I could only imagine that it’s got to be so much more difficult for journalists to not have any support within their organization for in-depth research,” said Boss.

If there’s a silver lining, it’s that online databases have made some historical reporting easier to access. The nonprofit Internet Archive—where journalists and media preservationists often rush to stash materials from closing outlets—offers an unparalleled free collection of archived websites collected since its founding in 1996, along with various other materials. Unfortunately, its historic web material typically isn’t keyword-searchable, meaning it can only be used to access content researchers already know about or can find surfing through other current or archived pages. For-profit companies like NewsBank, Newspapers.com, LexisNexis, and ProQuest separately offer searchable access to historic newspapers and other publications, regularly announcing expansions to their collections through deals with publishers, libraries, and other institutions. 

Their databases, though, aren’t typically free, meaning journalists looking to do historic research have to consider what they’re willing to pay for, what they can bill to publications, and what they can access using their personal library cards. Newspapers.com, owned by Ancestry, charges $19.90 per month (or $74.90 for six months) for access to its full collection. Some, like ProQuest, don’t offer individual subscriptions at all—the company suggests researchers work through a library with access.

 “You can find content across a broader array of newspapers that perhaps you might have when I started in journalism, but ordinary freelance writers can’t afford that stuff, right?” said Mara. “These are completely prohibitive in cost.”

Corporate database operators also have their own standards and priorities for what they collect—which may be opaque and not fully understood by subscribers—and what’s available at different subscription levels, and there’s no guarantee these align with the needs of journalists doing research or hoping for an accessible archive of their own work.

 “As such, the news cycle now includes reliance on proprietary organizations with increasing control over the public record,” warned the Tow Center report. “The Internet Archive aside, the larger issue is that these companies’ incentives are neither journalistic nor archival, and may conflict with both.”

With news industry support dwindling for both archiving and archival research, public and university libraries are in some cases picking up the slack.  Boss pointed out that many colleges and universities, especially public ones, offer access to unaffiliated researchers onsite to resources including databases. That can broaden access for journalists who can make it to a nearby campus. Librarians are also often happy to suggest potential research avenues, and many libraries have built up their own archives of published media, often including local outlets and voices not found in mainstream commercial archives but important to journalists and others looking to get a full view of the past.

One collection called Reveal Digital features “primary source collections from under-represented 20th-century voices of dissent,” made freely searchable through JSTOR with funding and materials provided by an assortment of libraries. It includes Black publications, materials published by activists of various stripes, alternative newspapers of the ’60s and ’70s, and newspapers created by people in prison. 

“They’re covering political events, they’re covering social things that are happening outside of the prison walls, but from the perspective of people who are incarcerated,” said associate director Peggy Glahn.

The University of Connecticut also has a sprawling Alternative Press Collection, including papers that were part of the Underground Press Syndicate, a ’60s-era radical alternative to the Associated Press, and later publications that railed against apartheid and environmental devastation. They offer historical insights that can’t be found in more mainstream publications, from coverage of underground music to in-depth interviews with controversial activists, said Graham Stinnett, an archivist who oversees the collection. 

Libraries across the country have their own unique collections of historical reporting, though that doesn’t often include much from the online outlets that are increasingly the only home for so much coverage. Alternative newspapers from 1972 can be easier to search than online news startups from, say, 2012. “That’s the black hole right now,” said Mara. “That is a yawning gap.”

And at a time when rumors, slanted news sites, and disguised advertorials increasingly compete with honest reporting, losing “the first draft of history” feels particularly perilous.

 

 

 

 

 

 

 

 

 

 

 

 

Sign up for our free weekly newsletter or log in

Subscribe to Study Hall for Opportunity, knowledge, and community

$532.50 is the average payment via the Study Hall marketplace, where freelance opportunities from top publications are posted. Members also get access to a media digest newsletter, community networking spaces, paywalled content about the media industry from a worker's perspective, and a database of 1000 commissioning editor contacts at publications around the world. Click here to learn more.