![]() |
|
|||
Mathematics Roundtable
Fred Yuengling Sponsor: SIAM Topic: Digital Archives in Mathematics Digital Mathematics Library, http://www.library.cornell.edu/dmlib/ The DML is a collective effort of mathematicians, librarians, and scholarly publishers promoting the digitization of mathematics content by establishing standards and guidelines for digitization, and providing a stable digital archive of mathematics literature. The DML is linked to the IMU (International Mathematical Union) to ensure that it provides fair and balanced international representation for mathematicians. In the Fall of 2001, Cornell University Library submitted a proposal to the NSF for a one-year planning grant. The principal investigator is Sarah Thomas, University Librarian, and the co-principal investigators are Professor Keith Dennis and Jean Poland, Associate University Librarian for Engineering, Mathematics, and Physical Sciences. Kizer Walker, the EMPS Digital Projects Librarian, has done much of the DML's writing and created its Web presence. In January 2002 at the Joint Mathematics Meetings in San Diego, Prof. Dennis convened an informal meeting on Mathematics Digitization. Philippe Tonderu, director of the NSF's Division of Mathematical Sciences from 1999 until 2002, was an early enthusiast of the DML concept and delivered a call to arms outlining what direction the digital mathematics library should follow. In July 2002, Cornell convened an international group at the NSF headquarters in Washington, D.C. to initiate development of a framework for a comprehensive, distributed, digital collection of the published knowledge in mathematics. This meeting launched the planning phase of the Digital Mathematics Library.
DML Scope: the entirety of past mathematics scholarship; however, it will not include computer science or applied mathematics. It will make literature available at a reasonable cost—following a JSTOR model, where a moving wall must exist for publishers to recoup their costs after which the content may move to free access.
Rights and Licenses: In order for the DML to be successful in digitizing the legacy literature, it will be important to create a plan that will attract content owners—-societies, non-profit university presses, and commercial presses. The planning group has concluded nonexclusive rights to articles will be retained by the publishing journal and material will be included in DML by agreement between the journal and the DML or one of its member programs. If the content does not reside on a DML server, the publisher's participation in the DML would require that they provide metadata and allow full access.
Economic Model: Overall DML efforts will be distributed internationally among many smaller projects, to be funded in most cases by national funding agencies. Rockey explains that the benefit to content owners is basically digitization of backfiles and publicizing the current content they have for sale. DML will provide guidance for model contracts that stipulate where backup would be held and assure that the materials would not vanish if the content owner fails to provide full access.
Archiving: The DML will approach archiving and preservation by way of the generic archive framework proposed in the Reference Model for an Open Archival Information System (OAIS.) The OAIS model is metadata driven. The importance of standard metadata for archiving and interoperability cannot be overemphasized.
Metadata: DML will need to develop a strategy that takes advantage of existing metadata repositories and provides guidance as to additional metadata required. Metadata will allow the existence of a registry of items available and items awaiting digitization.
Technical standards: scanning at 600 DPI minimum standard; for scanned raw data use TIF or PNM; use stable URL's with unique meaningful names. PDF will be primarily delivery method, with DJVU as a supplement. Both should be text searchable.
Current Developments: DML just had a meeting in May that closed the Cornell NSF funded planning phase. At that meeting it was decided that DML coordination of worldwide mathematics digitization would continue under the auspices of IMU. There will be at minimum one annual meeting for all the stakeholders. Future DML meetings will be organized by the Committee on Electronic Information and Communication (CEIC) of the IMU. A joint European Union effort based on the DML model is emerging because funding is becoming available and the group wants a coordinated response to calls for proposals.
EMANI, http://www.emani.org/ In July of 2001, Springer-Verlag announced the Electronic Mathematical Archiving Network Initiative (EMANI) "to insure the preservation and dissemination of mathematical information for future generations," in cooperation with Tsinghua University Library (China), Goettingen State and University Library (Germany,) and Cornell University Library. The Orsay Library in Paris joined soon after. Professor Dr. Bernd Wegner, editor of Zentralblatt, is the scientific coordinator of EMANI. Springer does not want this viewed as a Springer product. Participants include Springer, Birkhauser, Teubner, Viewig, and the electronic library of EMIS. Springer is very involved with DML and is very aware of DML proposed standards, while at the same time moving ahead without waiting for DML to become more formalized. EMANI was represented at the digitization meeting in San Diego in January 2002, and representatives of EMANI partners are present at all DML meetings. There are working groups to study metadata standards and preservation issues. EMANI is also addressing retro-digitization of Springer mathematics books and journal backfiles. There is an interest in doing the complete backfile of Communications in Mathematical Physics at Cornell, but funding is not in place. There is also talk of doing the backlist of the Lecture Notes in Mathematics. One thing of note is that in discussions with Springer about these projects there has been the discovery that they have no paper backfiles available in stock for retro-digitization. Publishers always assumed libraries would archive print backfiles. Can publishers make enough money from backfiles to support the cost of retro-digitization, reference linking, server costs, not to mention the long-term migration issues? EMANI is currently negotiating a moving wall type of agreement but no contract has been signed.
See http://www.springer.de/press/companynews/emani.html and http://www.sla.org/division/dpam/pam-bulletin/vol29/no2/mathematics.html.
Project Euclid, Cornell University Library, http://projecteuclid.org/ Since Jan 2003, Ruddy has been Acting Director of Project Euclid. Terry Ehling has recently accepted the position of Director of Electronic Publishing at Cornell, and beginning in August, will direct Euclid. Ehling is formerly of MIT press, where she was an early pioneer of digital efforts, including the creation of CogNet. Ruddy will become head of development for electronic publishing systems. Both should be at future SLA meetings.
Project Euclid is an initiative in electronic publishing by the Cornell University Library. The project's mission is to advance affordable scholarly communication in the field of theoretical and applied mathematics and statistics. Euclid is specifically designed to address the unique needs of low-cost independent and society journals through a collaborative partnership arrangement.
Initial support for the technical development of the project came three years ago from the Mellon Foundation. The basic technical infrastructure is now in place, and Euclid is entering a second, more externally focused phase. Work in the next two years will concentrate on building content (journal recruitment) and building a stronger subscription base (libraries and other subscribers). Moving the project to a cost-recovery position is the goal for the next three years.
System and Content: Journals available electronically through Euclid are in the areas of theoretical and applied mathematics and statistics. There are currently 19 titles available electronically through Euclid from 13 publishers. To date, the system includes approximately 6,000 articles ranging from 1952 through July 2003. Although the emphasis of Euclid is on current literature, the project is accepting journal backfiles from publishers. Euclid is committed to community standards and interoperability. The project complies with the Open Archives Initiative protocol, and works with Math Reviews and Zentralblatt to facilitate interlinking between the services. Euclid registers Digital Object Identifiers (DOIs) for all articles with CrossRef. All article level descriptive metadata in Euclid is open access. The long-term retention of data is a project commitment and an area of continuing research.
A tour of Project Euclid's functionality and features is available at http://projecteuclid.org:80/Dienst/UI/1.0/About?type=demo
EUCLID Subscription Information: EUCLID PRIME is a growing, collection of electronic journals available for subscription through Project Euclid. A price sheet for EUCLID PRIME is available in PDF format at http://projecteuclid.org/euclid/EuclidPriceList.pdf. There will be other titles in Euclid available title-by-title in 2004. Some publishers will handle their own electronic access subscription. At this time, all print subscriptions are still handled by journal publishers directly. The Project Euclid Web site has complete details.
University of Michigan Historical Mathematics Collection, http://www.hti.umich.edu/u/umhistmath/ Sears discussed UM's Historical Mathematics Collection project. It is a growing library of books selected from UM that have been digitized to improve access and to preserve content. All of the books were published in the 19th or early 20th century. Contributors to the NSF project included: Cornell University (Keith Dennis, David Ruddy, H. Thomas Hickerson), University of Goettingen (Hans Becker, Norbert Lossau), and The University of Michigan (John Wilkin, Beth Alyssa Kirschner). The content selector for UM titles is Sara Rutter now at the University of Hawaii at Manoa. Sears outlined the UM conditions existing at the start of the project: a strong collection of 19th century mathematics monographs, including Ziwet and Karpinski, a strong need for the mathematicians and philosophy researchers to use the historical mathematics books, and a mass de-acidification/pizza box project. The pizza box project was part of the deacidification project (i.e., items that were too brittle to be deacidified were placed in archival boxes to prevent further damage). The proposal was to digitize and OCR crumbling books, focusing on non-Euclidean geometry at the beginning and as many of the core works in the following bibliographies as possible: D.M.Y. Sommerville, Bibliography of Non-Euclidean Geometry, 2nd ed.; and, George Halsted, "Bibliography of Hyper-Space and Non-Euclidean Geometry," American Journal of Mathematics 1(3) 1878, p.261-276.
There have been 985 volumes completed so far. The actual initial goal was 1,000 volumes. The main criteria and selection process included that the book be held by the UM Library, that it had been published between 1800 to 1923, that the text was currently brittle, and that it had not been digitized by Cornell or Goettingen. The project scope required that the works be by mathematicians who contributed to the development of non-Euclidean geometry.
Sears outlined the steps taking in the project from the very basic initial steps of: finding the books, communicating with the remote storage facility, cataloging, and inspecting the books. They also had some people in the science library assist with retrieving the books and changing the status in the catalog records (to indicate the items weren't really on the shelf, but were "at preservation.") Scanning was outsourced to save money—-the NSF reformatting staff were internal staff ("mostly in our preservation unit") that prepared the items to be sent out for scanning. The project then used "Prime Recognition" OCR, which is a very high accuracy OCR scanning technique. It includes six engines. (Other options considered: Omnipage, TextBridge, FineReader.) They were able to handle multiple languages. The German titles were re-scanned ("with the German language feature turned on, so we could get a more accurate reflection of what was on the page"). For example, a word with an "o" and an umlaut would be scanned/OCR'd as an "o" with the English setting, but would scan as an "o" with an umlaut with the German setting. The XPAT search engine, which is SGML/XML-aware, was put online by the Digital Library Extension Service (DLXS) at UM, http://www.dlxs.org. A CGM protocol that is used for search-display in the interoperable interface is near completion. It is based on Open Archives Initiative (OAI) protocol, but with more features.
Accessing the books is possible by searching the UM Historical Mathematics Collection directly. Paper reprint copies of books are available from the UM Scholarly Publishing Office (SPO), http://www.hti.umich.edu/p/pod/.
All pages of the original item are reproduced from the digital page images and printed on acid-free paper; a hardcover binding option and online ordering is coming soon. To see the records in the UM library catalog, visit http://www.lib.umich.edu/mirlyn/mirlynpage.html.
Cataloging policies are coming soon. "Anonymous FTP" the free bibliographic records (in MARC format) from ftp://ftp.umdl.umich.edu/pub/records/um-math.zip.
Usage statistics for the collection have been captured. The total use since its inception (January 2002-May 2003): 16,308 sessions, 19,151 searches and 375,367 page-image views. See usage for your institution at http://stats.umdl.umich.edu.
Sears spoke on the access vs. preservation issue; UM is striving to provide global access, not just local, and to preserve content in a way that will enhance scholarly productivity. The cost of the project has been 18 cents a page for scanning, and 10 cents a page if the journal is cut. The decision to use Prime Recognition software was made by the Digital Library staff. They had already completed other digital projects and that influenced their opinion on this matter. It is expensive software, but the quality is better than other programs. "Since we're scanning for long-term archives, we wanted really good quality."
Discussion The World Digital Mathematics Library, http://www.wmdl.org (under construction), should be up in at the end of August. This site will have a rudimentary registry of digitized mathematics on the Web at first, which will become more complete over time.
See: "Twenty centuries of mathematics: digitizing and disseminating the past mathematical literature," http://www.ams.org/ewing/Twenty_centuries.pdf.
General Comments
|
||||
Special Libraries Association (SLA) assumes
no responsibility for the statements and opinions advanced by contributors
to the Association's publications. Editorial views do not necessarily
represent the position of Special Libraries Association. Acceptance of an
advertisement does not imply endorsement of the product by Special
Libraries Association.
Published by |
|||