At Risk: The Dire State of Digital News Archiving

Are we facing a crisis of public information? According to a new report from The Tow Center for Digital Journalism at Columbia’s Graduate School of Journalism, the news media is woefully unprepared for preserving news content in the digital age.

“Between March 2018 and January 2019, we conducted interviews with 48 individuals from 30 news organizations and preservation initiatives,” write Sharon Ringel and Angela Woodall in the Columbia Journalism Review. “What we found was that the majority of news outlets had not given any thought to even basic strategies for preserving their digital content, and not one was properly saving a holistic record of what it produces.”

“Of the 21 news organizations in our study, 19 were not taking any protective steps at all to archive their web output. The remaining two lacked formal strategies to ensure that their current practices have the kind of longevity to outlast changes in technology.” Columbia Journalism Review

Before the digital revolution took hold of the news industry in the late 1990s, archiving the physical news was a given part of running a newsroom.

“At a good number of newsrooms, an in-house librarian was a stop in this production pipeline, guaranteeing some level of future access by clipping individual news stories from the newspaper and filing them on-site according to subject keywords in a morgue (a physical space allotted to the clippings),” the authors note. “Back issues of whole newspapers were also frequently kept on-site in multi-story buildings.”

News moves to the cloud

As news moved online, content was created in distinct elements (headline, byline, text, images, comments, embedded video and hyperlinks, social headers, comments, etc), and was posted in different formats on a wide range of both first-party and third-party platforms. The physical archiving process was no longer relevant; digital archiving became a Sisyphean task.

“While the internet has created a vibrant information infrastructure, very little digital content is archived and former models no longer can guarantee long-term access,” the authors write. “Although some news workers recognize the risk of losing content, they continue to rely on content management systems or cloud-based servers to store their work, practices they confuse with preservation and that we argue are not the same.”

Image source: Columbia Journalism Review

Archivists working in digital content agree that there is a vast difference between storing your content in the cloud and following a rigorous archive protocol.

“What it boils down to is ensuring access to information that could otherwise be lost. Storing content is a passive activity and doesn’t encourage or require any forethought as to how someone might want or need to access it later,” explains Beverly Ingle, a digital content curator and archivist. “When digital content is arranged in a relevant way and identified to indicate what each piece of content relates to, then someone can easily search, locate and use the information they want or need.”

Part of the problem is a shift in the way media outlets view their work in the digital age.  According to the Tow report, it comes down to a change in focus.

“Journalists (and their news organizations) are more interested in preserving documentation of their reporting and what makes it accurate than preserving what ultimately gets published,” the article notes. “As a result, platforms and third-party vendors, which increasingly host news content on their closed servers, are in control of the pieces necessary for holistic preservation without the journalistic incentive to enact it.”

Can we solve this massive problem?

While some organizations rely on the non-profit Internet Archive, this style of web archiving has serious limitations in both format and volume of what can be preserved. Other services – like PastPages, NewsGrabber and Archive-It – are becoming more popular with news managers. The Tow Report believes the industry needs to take a hard look at the problem to define solutions to the big questions: What needs to be preserved? And who should do that work?

“Creating robust digital archives will mean grappling with tough questions, like how often to capture a copy of an ever-updating home page, if personalized content and newsletters should be preserved, and what to do with reader comments and social media posts,” the article continues.

“The tough questions about what to save, when, where and how aren’t new and aren’t going away, but the sheer volume of digital content created daily makes archiving that content seem daunting,” says Ingle.

What the industry needs are strong voices that can lead these discussions and drive home the importance of solid archives. The process needs to be robust yet simple and must be intimately aligned with the overriding goal of journalism to document and preserve the public record.