Carnegie Corporation of New York
Search
The Corporation's Program
Corporation News
Corporation Philanthropy
Research Reports
About Carnegie Corporation
Publications and Multimedia
  •  Carnegie Corporation Publications
  •  Operating Program Publications
  •  Recent Books
  •  Audio Library
  •  Publications Archives
Carnegie Reporter
Carnegie Results
Carnegie For Kids
Archives
Links
Medals of Philanthropy
• Site Map
• Feedback

 

 


Virtual Library Model:
A Report to Carnegie Corporation of New York

by Tamara Kummer

Summary

Carnegie Corporation of New York has developed a model virtual library to demonstrate one method of supplementing collections in African universities where Internet connectivity is limited and expensive. Scholarly articles free of both charge and copyright were downloaded from the Internet, sorted by subject and stored in a hard drive, later to be copied and housed in a server. Should the virtual library project be implemented, the server could be copied and exported to the beneficiary universities and operate at the center of a local area network. The following report

  • outlines the technological and ethical issues encountered during the two-month collection phase of the project
  • identifies and describes other initiatives addressing issues of free access to scholarly research and low Internet connectivity or bandwidth
  • offers suggestion to make the project design both more efficient and effective
Full Report

Carnegie Corporation of New York has initiated a virtual library project designed to address the problem of low Internet connectivity and limited access to web-based academic materials in African universities. The virtual library consists of documents collected from the World Wide Web, archived by subject and stored in a hard drive on a server. The server would then be sent to designated beneficiary universities to constitute the core of a local area network (or LAN), making the virtual library documents available to any computer terminal within the university's network. Thus, while limiting the need for Internet use (see "Update Methodology" section p. 3), the virtual library project would approximate the experience of using the Internet--albeit in a limited fashion. All documents collected were archived by subject in a succession of folders and subfolders. The format is best described as a "tree structure," the successive folders growing progressively more specific:



The introductory page of the virtual library is modeled on that of "Dmoz", an Internet search engine. A bold, underlined general subject title rests above its component categories in smaller, plain underlined font. All collected documents were indexed in an Excel table indicating title, author, the folder the document was archived in, the folders the documents could be cross-listed in (this column allows for more efficient keyword searches), the document's source and where to find its copyright information. Copyright notices were entered either as a web link or stored in the library in a folder entitled "Copyright Clearances," depending on whether they were posted on the web or not.

Some of the most important issues encountered during the design of the project and the collection of materials were both technical and ethical. Firstly, concerns that shaped the physical construction of the virtual repository included portability, "searchability" and devising a method for regular updates. Secondly, during the collection phase, issues arose relating to authority for the selection of materials as well as problems of intellectual property rights.

Portability and Searchability
While the Virtual Library Initiative is currently a model and has not yet been implemented, the project's design has drawn on the progress, but also on the shortcomings of parallel projects addressing similar issues. One of the central inspirations for the Carnegie Corporation's project--eGranary, pioneered by Cliff Missen at the University of Iowa--does solve the issue of Internet connectivity by locally storing academic materials on an external, reproducible hard drive. In an early incarnation, eGranary was not an entirely searchable database; building on this model, the Corporation's Virtual Library was created on a server platform which did allow for faster access to more specific information and improved searchability by subject, keyword and author. EGranary has also gone to some lengths to improve their functionality, including recruiting area editors from around the world (currently they have six volunteers on three continents); they are also in the process of creating a customized search engine.

The Virtual Library's storage space is constrained by what computer technology currently offers. The capacity of the hard drives that will be used in Carnegie Corporation's project store approximately 300G of information. While this does constitute a considerable amount, it is nowhere near enough to store all digitally available academic research. Additionally, academic and educational materials are not limited to scholarly research and should include maps, videos, music and software (etc.), all of which take up more digital storage space.

Update Methodology
While the potential beneficiary universities of Anglophone Africa do not lack Internet connectivity entirely, it is often discontinuous and expensive. Both the incomplete supply of materials and the need to keep collections current are concerns the Virtual Library program has anticipated. Several options exist for performing regular updates.

The two most efficient methods involve mirroring software. "Mirroring" is a process by which client servers housing the virtual library in Africa would connect to the host computer (containing the most up-to-date version of the library) via the Internet. Mirroring software in client computers would "look" into the host computer in order to detect any new or updated material, then copy and transfer it to the client. The first update method would involve programming the mirroring software in client servers to run at regular intervals during hours of optimal Internet connectivity. The second method would involve mailing updates on CD-ROMs or DVD-ROMs to the beneficiary universities. Equipped with mirroring software, the client servers would detect any differences between the files housed on their hard drives and the files on the CD or DVD-ROMs and update the server's collection.

Other update methods that do not use mirroring software include simply shipping an updated version of the virtual library on a new hard drive that would replace the one in the beneficiary university, which is one of the current delivery methods for eGranary, which often relies on volunteers traveling to Africa to deliver a new hard drive to a university. Physically transporting hard drives is an inefficient method, and, in general, should be considered only if connectivity is nearly inexistent or if the client computers in the recipient universities are not equipped to read CD or DVD-ROMs.

Additionally, individuals with more consistent access to the Internet, such as professors or librarians, could download individual articles as needed and store them in the Virtual Library. This does indeed imply somewhat of an "elitist" project design in the sense that the university community at large would not have unlimited access to all documents available on the web.

Unfortunately, until Internet connectivity (either wired or wireless) becomes more reliable and affordable, a majority of African university communities may be forced to operate through intermediaries to access the World Wide Web. Consequently, it is important that local administrators be shown how to upload compact disks sent as updates, as well as archive documents they download into the Virtual Library to ensure that the rest of the university community has access to new resources.

This issue has implications even in an environment in which bandwith availability begins to increase and price comes down: it will still be useful and economical to make materials available over a local area network--for example, that will allow users to share one copy of an archived document rather than have many users download many copies and possibly incur multiple costs.

Selection
In both the eGranary and Carnegie Corporation initiatives, the attempt to anticipate user choice and the collection of corresponding information on the Internet creates a delicate ethical situation. The need to store materials locally in a limited physical digital storage space renders the following questions inevitable: what materials are to be included and who is empowered to make such choices? The person collecting materials from the Internet effectively acts as a filter between the users and what would potentially be available to them should they have access to the Internet.

Because of its role as a model, Carnegie Corporation's Virtual Library project has not established a method for regularly consulting professors and students in the universities where there are libraries it hopes to enhance. If the model were to be further developed and eventually implemented as a working project, a partnership between whatever organization would be responsible for the library and the African scholarly community the project is meant to serve would make the virtual library most effective and efficient in meeting the needs of institutions of higher learning. Further details on this topic will appear in the final section of this report under "Recommendations."

Copyright and Intellectual Property
A wealth of information is readily available for download on the Internet. In contrast, accompanying legal notices explaining rights to reproduce materials for distribution, albeit for educational purposes, are not as easily found. While some sites do provide links to copyright information regarding the use of the materials posted, others simply provide an electronic address to contact for questions regarding legal issues.

In developing this model library some confusion arose, especially with regard to authors' retention of copyright on work published in scholarly journals or in newspapers and subsequently posted on the Internet. In fact, the 2001 Supreme Court ruling of Tasini v. The New York Times1 declared that publishers do not retain rights to work they have printed once, and must reach a separate agreement with authors should the publishers desire to archive their printed material electronically.

Yet another element of uncertainty raised by the implications of Title 17 of United States Copyright Law:
§ 107. Limitations on exclusive rights: Fair use 38 Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include-- (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work.
It doesn't appear as though documents stored on the virtual library would infringe fair use provisions as stated above. Yet there is still a question as to whether the virtual library ultimately amounts to "multiple copies for classroom use," since use wouldn't necessarily be confined to a classroom. It is also important to note that no consistent legal standard has been established for what constitutes "fair use." Cases are litigated on an individual basis and courts continually warn that their decisions should not be interpreted as developing a standard. This legal text also fails to clearly specify official provisions for digital, rather than print copies. In addition, the following section of the Code would only permit a limited number of copies of copyrighted material to be archived in a library-virtual or not:
§ 108. Limitations on exclusive rights: Reproduction by libraries and archives39 (a) Except as otherwise provided in this title and notwithstanding the provisions of section 106, it is not an infringement of copyright for a library or archives, or any of its employees acting within the scope of their employment, to reproduce no more than one copy or phonorecord of a work, except as provided in subsections (b) and (c), or to distribute such copy or phonorecord, under the conditions specified by this section, if-- (1) the reproduction or distribution is made without any purpose of direct or indirect commercial advantage; (2) the collections of the library or archives are (i) open to the public, or (ii) available not only to researchers affiliated with the library or archives or with the institution of which it is a part, but also to other persons doing research in a specialized field; and (3) the reproduction or distribution of the work includes a notice of copyright that appears on the copy or phonorecord that is reproduced under the provisions of this section, or includes a legend stating that the work may be protected by copyright if no such notice can be found on the copy or phonorecord that is reproduced under the provisions of this section. (b) The rights of reproduction and distribution under this section apply to three copies or phonorecords of an unpublished work duplicated solely for purposes of preservation and security or for deposit for research use in another library or archives of the type described by clause (2) of subsection (a), if-- (1) the copy or phonorecord reproduced is currently in the collections of the library or archives; and (2) any such copy or phonorecord that is reproduced in digital format is not otherwise distributed in that format and is not made available to the public in that format outside the premises of the library or archives2.
While Carnegie Corporation's Virtual Library is still a model, if it is implemented in some form, surely more copies of the hard drive will be made than the above statement permits. For the purpose of this project, when we were in doubt, letters of inquiry were sent to copyright holders for clarification. Over the course of two months, out of ten letters sent to request permission to redistribute articles posted on websites or in web-based journals (for which copyright rules were unclear), only three received a positive response, and only six received any response at all. Only few of these letters needed to be sent out due to the existence of numerous initiatives offering free access to published scholarly research.

Parallel Projects Many of the sites used for the collection of materials were those of projects undertaken specifically to share literature, scholarly research and course materials free both of cost and copyright restrictions. These projects include:
  • Project Gutenberg, begun in 1971 by Michael Hart at the University of Illinois. The project makes electronically transcribed literary "classics" available at no charge on the Internet in "Plain Vanilla ASCII" format3. This computer language makes Project Gutenberg E-texts intelligible by the vast majority of computer operating systems, assuring largely unfettered access to a majority of Internet users. The project depends on volunteers who electronically transcribe documents that have entered the public domain (lost their exclusive copyright). One drawback of relying on public domain materials is pointed out by Hart who notes, in his 1992 description of the project's mission, that "the time before a copyrighted work entered the public domain was extended from 28 [...] to 50 years more than the life of the author4" making it impossible to keep Project Gutenberg E-texts anywhere near current. The format Project Gutenberg E-texts are in allows for easy editing, which Hart encourages. Although this diminishes any possibility of maintaining "authoritative" versions of transcribed works, it is a way of incorporating valued reader feedback

  • MIT Open Courseware Initiative publishes MIT undergraduate and graduate course materials electronically (lecture notes, graphs, links, problem sets and listings for additional resources). The database is easily searchable by academic department, topic, class name, term and instructor name. All materials are free of cost and governed by the Creative Commons License5, which allows for the open reproduction and redistribution of posted materials, provided they are attributed to MIT OCW. While the use of the materials on the Open Courseware site cannot be credited towards a class or a degree, the initiative does offers access to academic materials from one of the world's most prestigious universities. In order to continuously improve their service to users worldwide, the Open Courseware initiative enthusiastically encourages feedback: a link to a feedback form appears on every webpage.

  • The Avalon Project site at Yale makes texts of key historical documents available on the Internet at no cost. Subjects covered remain within the social sciences. The Avalon Project's archives can be searched by event, century, alphabetical order or geographical area. By promoting the reproduction of the posted documents and dynamically linking its site to other, sister projects, the Avalon project hopes to effectively and efficiently disseminate knowledge.

  • The Directory of Open Access Journals was founded at the First Nordic Conference on Scholarly Communication at the University of Lund and Copenhagen in October of 20026 in response to the multiplication of freely available scholarly e-journals. The Directory of Open Access journals is a comprehensive listing of all Open Access journals and is searchable by journal title or subject.

    The Open Access Initiative, funded in large part by the Open Society Institute, was launched after a conference in Budapest aimed at strengthening partnerships between various open access journals worldwide. Open access journals are defined as "journals that use a funding model that does not charge readers or their institutions for access [...] we take the right of 'users to read, download, copy distribute, print, search or link to the full texts of these articles' as mandatory."7 In order to ensure the high quality of materials posted on DOAJ, included journals must have an editorial board and a peer review system.

    The Open Society Institute also funds a number of smaller, related projects which include advocacy initiatives, economic research, the creation of new document repositories for academic institutions, a series of guides for the conversion of subscription-based journals to an open access model, and software for storing documents requested from subscription based journals8. For example, software such as LOCKSS (Lots Of Copies Keeps Stuff Safe), a "Permanent Web Publishing And Access System" is a computer program designed to store copies of documents requested by users on terminals of institutions subscribed to private, electronic publishers9. The software permanently stores requested documents at the subscribing institution, which means there is no need to repeatedly request the same document, thus expanding library collections by saving documents locally.
Subscription based journal systems such as JSTOR have yielded interesting information in the effort to develop the content of Carnegie Corporation's Virtual Library. While subscriptions are typically very expensive, JSTOR offers heavily discounted rates to institutions in the developing world10. While the fee reduction is very generous, it is of little help in countries where Internet connectivity is prohibitively expensive and intermittent, or bandwidth is low. A JSTOR11 representative contacted during the course of creating the Corporation's Virtual Library model pointed out the difficulty of locally storing JSTOR articles. A nonprofit organization addressing similar connectivity issues in Central Asia had previously approached him with the same request. The problem lay not in JSTOR's willingness to facilitate access to its information by reducing its subscription prices, but in the physical storage capacity of currently available technology. JSTOR archives are ".tif" files meaning that each page of an article is saved as an image, rather than text. Image files being significantly bigger than text files, it might be difficult to store a meaningful portion of a JSTOR collection in an average local repository. The JSTOR representative also points out that there is an even more critical problem facing projects such as JSTOR, which involves ensuring the security of the publications they make available in a way that satisfies the publishers and/or other copyright holders that their intellectual property will be protected from illegal distribution, duplication, and data corruption.
  • PloS (Public Library of Science) is governed by the Creative Commons Attribution License, as is MIT Open Courseware. Its mission is to make high quality scientific research available freely to the international life sciences community.

  • BioMed Central functions essentially as does the Directory of Open Access Journals (DOAJ) described above. BioMed Central provides a consolidated listing of all open access journals in medicine and biology while DOAJ lists all Open Access Journals irrespective of their subject.

  • www.Marxist.org offers full digital transcriptions of the works of political and economic theorists, sociologists and historians. Despite its name, the site does not limit itself exclusively to the writings of the extreme left wing.
These websites were of inestimable value in acquiring material for virtual library. Most were very easy to use and provided high quality documentation free of cost and copyright as their project mission stated. Yet even if the materials are free and, as such, facilitate access to knowledge, a significant limitation of all the aforementioned initiatives is that they rely entirely on Internet connectivity, effectively excluding large portions of the developing world's university communities.

Additional Sources
Other sources for copyright free documentation included many non-governmental organizations: the United Nations and its subsidiary agencies and protocols--the UNDP (United Nations Development Programme, WHO (World Health Organization), FAO (Food and Agriculture Organization) and UNFCC (The United Nations Framework Convention on Climate Change)--the United States Institute for Peace and the World Policy Institute.

U.S federal government offices, foundations and agencies also provided valuable materials (however, the digital national defense archive is, surprisingly, privately owned by ProQuest Information and Learning Company and a subscription to the archive is needed for viewing). These included the Government Printing Office (GPO), and the National Archives and Records Administration, the National Science Foundation (NSF) and the Center for Disease Control and Prevention among others. Carnegie Corporation was also invited to include material from the Open Mind Television Project which has digitized many programs from the series.

Recommendations
As noted earlier, Carnegie Corporation's Virtual Library Initiative is intended as a model. Following are some suggestions for the improvement of the project design, should the initiative be adopted by others in the future.

Dynamic Dialogue with the Designated Beneficiaries
In order to effectively and efficiently enhance the capacity of the African universities that would access the Virtual Library, it is necessary to incorporate a method for the systematic consultation of scholars, librarians and students in the beneficiary universities. While the Virtual Library project design does anticipate that a "select few" would need to use the Internet to download any omitted articles (and it would be naive to assume the initiative could allow beneficiaries to avoid Internet use altogether), the project's aim is to remedy reliance on defective Internet connectivity, whether in a wired or wireless environment. To minimize the occurrence of missing or redundant documents, potential beneficiaries should be given the means to make requests for the initial collection phase, or to inform subsequent updates. This feedback mechanism could simply take the form of a paper list mailed on a regular basis to those maintaining the library, or a "Requests" folder could be incorporated into the library itself to be filled in by users and later sent to the library managers via e-mail or post mail.

As only one researcher collected documents for this model, it was impossible to read through all the documents collected to ensure high quality and accuracy. A regular feedback mechanism would also serve as a peer review, especially in the initial stages of implementation.

Recording Use
The virtual library could record the number of "visitors" to specific folders and subfolders. A log of user activity could be collected at regular intervals and could be utilized to focus document collection in given subjects.

Tutorials for Use
Tutorials for best use of the library would ensure that beneficiaries get the most out of the library. The tutorial could also include instructions for African scholars to upload their own work and share it with larger audiences.

Building Partnerships
The list of parallel projects above is encouraging not only in its length, but also in the variety of initiatives it contains. The state of Open Access publishing, along with projects such as eGranary and Gutenberg reflects that of humanitarian initiatives worldwide. While their numbers are multiplying, their potential for positive impact is diminished by the lack of coordination between them. The weakness inherent in fragmentation is easily overcome by encouraging partnerships among the projects listed above and others that may be underway or under consideration; creating links to each other's materials (a link to the eGranary hard drive will feature in the Virtual Library) and sharing information about new initiatives and ideas.

Conclusion
Should a virtual library project be undertaken by an organization with the staff, financial resources and ability to make a long-term investment in the project's management and continual updating, it would nearly fully recreate a web experience for hundreds, or hopefully, thousands of students and professors in Anglophone Africa. With scholarly research increasingly published digitally, limited Internet connectivity necessarily restricts access to current research and compounds the situation of marginality African faces today. The virtual library would, however incompletely, bridge important gaps in access to web-based academic materials, and would also give increased exposure to the work of African scholars themselves. Crucial to this undertaking, however, would be a source of funding and staff to pay for and manage the licensing that would be required to include the ever-increasing wealth of materials protected by international copyright laws. Also crucial would be instituting a mechanism to consult the various scholarly communities that the project aims to help. A relationship built on trust and dialogue with beneficiaries as well as other humanitarian organizations is central to achieving the project's praiseworthy mission.

About the author: Tamara Kummer was born in New York City to a Franco-Tunisian mother and an American father. In New York City, she attended both the United Nations International School and the Lycee Francais, and later spent two years at the Lycee Montaigne in Paris. Tamara graduated magna cum laude from Boston University in May of 2004 as a political science major with minors in Spanish and African studies. In her last year of undergraduate study, she became a member of Pi Sigma Alpha, the national political science honor society. During the summer of 2003, Tamara interned at the World Food Programme in Rome with the Vulnerability Analysis and Mapping (VAM) unit. The following summer, she interned at the Carnegie Corporation of New York working on a model "Virtual Library" project designed to enhance the library collections of African universities where Internet connectivity is deficient. Tamara will spend the 2004-2005 school year at the London School of Economics and Political Science pursuing a Master's degree in Comparative Politics.

Footnotes

  1. Supreme Court of the United States New York Times Co. Inc., et al. v. Tasini et Al. No. 00-201 June 25, 2001

  2. Circular 92, Copyright Law of the United States of America and Related Laws Contained in Title 17 of the United States Code pp.26-29 June 2003

  3. Hart, Michael S. "The History and Philosophy Behind the Project Gutenberg" August, 1992

  4. Hart, Michael S. "The History and Philosophy Behind the Project Gutenberg" August, 1992

  5. Massachusetts Institute of Technology, 2003

  6. Lund University Libraries First Nordic Conference on Scholarly Communications Lund 22-23 October 2002, Copenhagen 24 October 2002

  7. Lund University Libraries Directory of Open Access Journals Definitions, 2004

  8. for a complete listing of Open Access related projects funded by the Open Society Institute, please visit http://www.soros.org/openaccess/grants-awarded.shtml

  9. Reich, Vicky and Rosenthal, David S. H. "LOCKSS: A Permanent Web Publishing and Access System" D-Lib Magazine Vol. 7, no 6 June 2001

  10. Similarly, the eGranary project has obtained license to copy the World Book Encyclopedia onto their hard drives. However, World Book only allows for the free use of its encyclopedia to eGranary users in the very poorest nations of the world. Other, wealthier regions must pay a modest fee.

  11. JSTOR, which was created and funded by the Andrew W. Mellon Foundation, also receives support from Carnegie Corporation of New York

 


Search - Program - News - Corporation Philanthropy - Research - About - Publications & Multimedia - Carnegie Reporter
Carnegie Results - Carnegie for Kids - Archives - Links - Medals of Philanthropy - SiteMap - Feedback


Copyright Statement

Carnegie Corporation of New York
437 Madison Avenue, New York, NY 10022 USA
Tel: (212) 371-3200 Fax: (212) 754-4073