Digital Preservation for the Nation
By Chris Zammarelli, The Brookings Institution
Digital preservation of electronic government documents is a vital task of the federal government, but it also presents unique challenges. Rita Cacas, Communications Officer for the U.S. National Archives' Electronic Records Archives (ERA) project, and Mike Wash, Chief Technology Officer for the U.S. Government Printing Office, presented on two ways the government is tackling this undertaking.
Cacas began with an overview of National Archives and Records Administration's mission, which is to "provide access [to government information] for the life of the republic." In 1995, Ken Thibodeau, then-director of NARA's Center for Electronic Records, reported to the Archivist of the United States that it was heading towards "mission failure" because it was not keeping up with the increasing volume of the government's electronic documents. It became critical for NARA to develop a means for electronic records preservation.
Knowing how huge this the project was, the agency began creating research partnerships with organizations like the Library of Congress, the University of Maryland, IEEE, and NASA to work on the digital aspects of the project. ERA became an official project in 2000, and Congress made its budget separate from NARA's regular operating costs.
According to Cacas, one of the challenges in creating the archive is that the documents need to be accessible regardless of what type of file they are. The goal is to avoid needing to keep all the different types of systems that run all the different types of files. "NARA can't become a computer museum," she said.
The sheer volume of information the government produces also makes the problem difficult. The Census Bureau produced 600-800 million TIFF files during the 2000 Census, for example. Moreover, the government produces documents in all file types. Take the 9/11 Commission, whose document types ranged from spreadsheet files and emails to satellite imagery and HDTV programs.
Compounding the problem is that all the agencies developed their computer systems separately. Cacas pointed out that these so-called "stovepipe systems" are not necessarily compatible with each other.
The ERA was created to ensure authenticity of the record and to improve security, especially for the classified documents that will be added in a couple of years. Redaction and FOIA capabilities are forthcoming. While the project can be affected by budget constraints, it should be fully operational by 2011-12.
Next, Mike Wash outlined GPO's FDsys project, which will streamline how the office maintains and publishes government information. He noted that access to government information is now expected to be in an electronic form. However, the GPO originally converted documents into electronic form near the end of the publication process. Now content is created electronically from the outset.
FDsys will automate the process of collecting and disseminating electronic information. It will provide authentic and verified files that will be accessible from the internet and printable on demand. He stressed that the new system will not be phasing out print documents, but rather, is supporting them.
Wash reported that FDsys is built using an Open Archival Information System (OAIS) framework. This model allows for the creation of information packages that describe and preserve data and make it easily accessible.
He said that the system will have two more years before it is fully operational. The GPO will be rolling it out incrementally, with the first public release through GPO Access coming out later this year.
This session was presented by SLA's Government Information and Museums, Arts, and Humanities Divisions, and sponsored by Deep Web Technologies.  Ann Shea of the California African-American Museum served as moderator.
|