Libraries' Role in Preserving Digital Information

Bodley’s Librarian warns that libraries must rise to the challenge of the digital era, preserving vital electronic information — before it’s too late

None

KEY TAKEAWAYS

• Since the days of antiquity, libraries and archives have played a critical role in society by preserving knowledge and making it widely available. This crucial role has on occasion put these institutions in the crosshairs of forces targeting the democratic and pluralistic culture they represent.

• Libraries use an array of tools to preserve knowledge, but the sheer amount of “born-digital” information since the advent of the Internet has greatly complicated the process of preservation for future generations.

The challenges are amplified when technology companies, for whom preservation of knowledge is a purely commercial consideration, decide a platform is no longer serving their business. This happened in February 2019 with the photo-sharing site Flickr, which decided to limit the users of free accounts to 1,000 photos and videos and automatically deleted any excess files above that amount.

Preserving knowledge as a way of keeping electorates informed is becoming critical to the future of democracy. For example, the UK Web Archive has captured over 2,400 sites relating to the 2016 referendum on EU membership known as Brexit.

• Digitization is expensive, but the case for funding such initiatives has never been stronger.


A group admires one of the beautiful windows in Duke Humfrey’s Library, the oldest reading room in the Bodleian Library at the University of Oxford. Opened to scholars in 1602, the Bodleian (better known as “the Bod” to many Oxford scholars) is one of the oldest libraries in Europe. Holding more than 13 million printed items, in Great Britain it is second in size only to the British Library. Bodley’s Librarian is the unusual title given to the head of the Bodleian, which is named after the library’s founder, the English scholar and diplomat Sir Thomas Bodley (1545–1613). There have been 25 Bodley’s Librarians in total, with the most recent, Richard Ovenden, appointed in 2014. (Photo: Angel Sharp Media, © Bodleian Libraries, University of Oxford)

Libraries are having a “moment.” Two books published in 2018 have brought the role of libraries into broader social debates — Susan Orlean’s The Library Book focuses on the civic role of the Los Angeles Public Library, and Eric Klinenberg’s Palaces for the People has drawn attention to libraries, especially public libraries, as “social infrastructure.” These works are a welcome reminder to a wider public that libraries and archives are essential ingredients to maintaining an open society.

Libraries and archives are essential ingredients to maintaining an open society. Society faces numerous challenges in today’s world, and libraries and archives can help us confront those challenges in important ways, mostly by staying true to their core mission. Central to that mission is the preservation of knowledge.

Society faces numerous challenges in today’s world, and libraries and archives can help us confront those challenges in important ways, mostly by staying true to their core mission. Central to that mission is the preservation of knowledge. Throughout history, beginning with the earliest communities, libraries and archives have functioned as institutions that preserved knowledge, while helping to pass that knowledge along to future generations. Archaeologists continue to find evidence of archives among the civilizations of Mesopotamia and Asia Minor, dating from as early as the second millennium BCE, and they have also found evidence of the existence of professional roles akin to what we would think of today as librarian or archivist.

From the ancient world onward, libraries and archives have developed new and innovative ways of preserving knowledge, of organizing the material records of cultures, and of finding ways of making that historical knowledge widely available. You can see the importance of these functions by looking at the attempts throughout history by governments and others to deliberately destroy knowledge. In 1814, for example, British forces burned the 3,700 volumes of the still young Library of Congress in Washington, D.C., seeking to undermine the operations of the nascent U.S. government. More recently, in August 1992, the Serbian militia deliberately shelled the National Library of Bosnia in an attempt to eradicate the pluralistic culture the library represented. Horrifyingly, snipers trained their bullets on librarians and firefighters as they scurried to retrieve collections from the burning library.

In more recent decades, libraries have tended to shift their emphasis away from preservation to access. The advent of digital technologies has provided librarians and archivists with new tools to organize information and to share it more broadly through digitization and the power of online networks, transforming the way libraries and archives have developed and expanded access to their resources and services. However, this shifting emphasis of the profession (and of budgets) has had the effect of sidelining preservation. At the same time, the more that information is “born digital,” the greater the challenges of preserving it for the long term.

Until the advent of digital information, libraries and archives had a well-developed strategy for preserving the material that made up the bulk of their collections: namely, paper. These collections were able to survive over millennia in “regimes of benign neglect” (to borrow a phrase from Clifford Lynch of the Coalition for Networked Information). Stable levels of temperature and relative humidity, avoidance of flood and fire, and well-organized shelving were at the heart of the preservation strategy. Inherently less stable, digital information requires a much more proactive approach, and not just because of the core complexities of technology itself, such as file formats, operating systems, and software. These challenges have been amplified by the widespread adoption of commercial online services offered by major technology companies, especially those in the world of social media — entities for whom preservation of knowledge is a purely commercial consideration.

In recent years we have experienced a spate of threats to knowledge. At the end of last year the photo-sharing site Flickr, struggling to keep pace with competition from the likes of Instagram, announced that it was reducing the amount of free storage that its account holders could have access to. After February 2019, users of free accounts would be limited to 1,000 photos and videos, with any excess automatically deleted. Millions of Flickr users found that much of their content had been permanently, irrevocably removed. Although Flickr (like many other social media companies) claims to be a service for sharing content, many users primarily use it for its free cloud-based storage. What happened at Flickr shows us that “free” services aren’t really free at all. The business model of companies like Flickr is based on the trading (often unacknowledged) of user data, and as market share is lost to competitors, “free” services have to make way for paid premium services.

The problem that the Flickr case study throws up is one of digital preservation. Think of the billions of images that individuals and organizations placed on Flickr. Active users will have known about the coming changes and were perhaps able to move their data swiftly onto other platforms or into a more proactively managed environment. Others who lacked the ability to move fast enough simply lost images of their loved ones, a photographic record of their adventures, or stock photos for their company. Gone forever in the blink of an eye. Consumers have had similar experiences with “free” platforms like Myspace and Google+, which closed down rapidly and with little advance notice. Precious information was lost, some of it gone forever.

Archival practice has its origins in state administration — recording such mundane but vital information as property records, taxation, and import-export details. Even in the ancient world, it was recognized that access to these records was important for efficient administration, but the exponential growth of electronic records has made the preservation of government documents highly precarious. In December 2018 the state government in Maine revealed that it had suffered a catastrophic loss of public documents from the administrations of governors Angus King and John Baldacci, with most official emails sent before 2008 irretrievably lost. Many other kinds of documents were destroyed by state officials, never making it into the Maine state archives. Not only has information for future historians and researchers been lost, but, for example, emails that could potentially be used as evidence in high-profile legal cases have been destroyed. Email records, when pieced together, can tell a story in enough detail to help secure a conviction (or prevent a defendant from going to jail) — as the work done by lawyers like Larry Chapin on the Libor scandal from a few years ago has shown.

In the case of the Enron scandal from the early 2000s, litigation would have been much easier were digital preservation solutions more readily available in the corporate world of the time. As we now know, Enron employees deleted vast numbers of emails and other digital information, hampering the ability of the corporation’s auditors from knowing what was going on and, later, making legal proceedings harder — and more costly.

Preservation of knowledge is fundamentally not about the past but the future. The ancient libraries of Mesopotamia were filled with a preponderance of texts containing predictions for the future: astrology, astronomy, and divination.

Preservation of knowledge is fundamentally not about the past but the future. The ancient libraries of Mesopotamia were filled with a preponderance of texts containing predictions for the future: astrology, astronomy, and divination. Rulers wanted information that would help them decide the optimal time to go to war. Today the future continues to be dependent on access to knowledge of the past — and will be even more so as digital technology changes the ways we are able to predict what comes next.

As technology firms develop wearable technologies, the amount of biometric data that will be captured from each of us will reach a point that medics will be able to make increasingly accurate predictions about our future health. This will help in the prevention of disease, but it will also open up major ethical issues. Who will own this data? We may be happy to share medical data with our doctor, but would we be happy with it getting into the hands of our health insurer? Perhaps libraries and archives can play a vital role here, serving as trusted agents providing individuals with access to their personal digital information organized, secured, and preserved to the highest archival standards. Under such a scenario, citizens would control who has access to their personal information, while libraries would be granted the right to aggregate and disseminate anonymized information solely for the purposes of public health.

Access to knowledge will be of critical importance in a number of other areas in the future, for which commercial interests may in fact not serve society’s best interests. Most businesses expect to be around for years into the future. But for society more broadly, some organizations have real skin in the game, and this is where digital preservation can become a life-or-death issue. Take the nuclear industry. As a society we really need to be sure we will know long into the future — and I don’t just mean the next five to 10 years but hundreds and even thousands of years hence — exactly where we have stored nuclear waste, what material it consists of, when it was placed there, what kind of container it was stored in, and so forth. This data exists today, but the challenge facing the Nuclear Decommissioning Authority and other players in the nuclear world is how we can be sure that property developers, mining companies, and water suppliers, as well as local authorities, governments, regulators, and the population at large have guaranteed access to all this information in, say, 500 years’ time. We need to know where to find it, that the format the information is stored in can be accessed, and that we can make sense of it when we really need to. Sound familiar? It’s called good archival practice.

As our digital lives continue to embrace more and more aspects of what we as individuals and societies get up to on a daily basis, we will continue to encounter ways in which libraries and archives can help society remain truly open. As the political sphere has embraced digital information, we have seen the rise of “fake news” and “alternative facts.” Preserving knowledge, in order to inform electorates, is becoming a critical issue to the future of democracy. In recent years, political campaigns across the globe have exploited the platforms and services offered by technology companies, social media firms, and data corporations. Much of that activity has fallen into legal gray zones, if not outside the law.

Web archives have become important because they are able to permanently preserve the public statements of political candidates, office holders, and government officials (often to their embarrassment), so that the public, the media, and, eventually, voters can call them to account. The UK Web Archive is a collaborative effort of the six copyright libraries in the United Kingdom, and one of its special collections of blogs and websites has captured over 2,400 sites relating to the 2016 referendum on European Union (EU) membership, known as Brexit, as well as the political aftermath of the vote. In April 2019 the Vote Leave campaign deleted a great deal of content from their public website, including references to that campaign’s promise to spend £350 million a week on the National Health Service (NHS) if Britain left the EU. The collaborative UK Web Archive captured the website before that content was deleted by the Vote Leave campaign.

Then there is a group called Led by Donkeys (the name has its origins in a phrase used during the First World War, when British infantrymen were often described as “lions led by donkeys,” giving a sense of what the men on the front thought of their generals). The mischievous campaigners of today’s Led by Donkeys take the pronouncements and (often humiliatingly wrong) predictions of pro-Brexit politicians, blow them up into giant tweets, and then place them on massive crowdfunded billboards across the U.K. (One must note that American political leaders have not been spared this treatment.) This activity, which I call “public archiving,” shows the importance of recording information that can call politicians to account for what they have actually said or written. Political discourse has often been a battleground between truth and falsehood, but the digital arena amplifies the influence that political falsehoods can have on the outcomes of elections. We need libraries and archives to increase their efforts at archiving the Internet and to move more purposefully into the archiving of social media.

The case for funding libraries and archives has never been stronger, yet it needs to be repeated and amplified — loudly — if we are to avoid George Orwell’s haunting prediction in 1984: “The past was erased, the erasure was forgotten, the lie became truth.”

The traditional work of libraries and archives costs money, and funding continues to be a major issue. Digital preservation adds considerably to the expense of providing the kinds of services that society depends on libraries to provide. The case for funding libraries and archives has never been stronger, yet it needs to be repeated and amplified — loudly — if we are to avoid George Orwell’s haunting prediction in 1984: “The past was erased, the erasure was forgotten, the lie became truth.” ■