
Virtual
Library Model:
A Report to Carnegie Corporation of New York
by
Tamara Kummer
Summary
Carnegie Corporation of New York has developed a model virtual library
to demonstrate one method of supplementing collections in African
universities where Internet connectivity is limited and expensive.
Scholarly articles free of both charge and copyright were downloaded
from the Internet, sorted by subject and stored in a hard drive, later
to be copied and housed in a server. Should the virtual library project
be implemented, the server could be copied and exported to the beneficiary
universities and operate at the center of a local area network. The
following report
- outlines the technological and ethical issues encountered during
the two-month collection phase of the project
- identifies and describes other initiatives addressing issues
of free access to scholarly research and low Internet connectivity
or bandwidth
- offers suggestion to make the project design both more efficient
and effective
Full Report
Carnegie Corporation of New York has initiated a virtual library project
designed to address the problem of low Internet connectivity and limited
access to web-based academic materials in African universities. The
virtual library consists of documents collected from the World Wide
Web, archived by subject and stored in a hard drive on a server. The
server would then be sent to designated beneficiary universities to
constitute the core of a local area network (or LAN), making the virtual
library documents available to any computer terminal within the university's
network. Thus, while limiting the need for Internet use (see "Update
Methodology" section p. 3), the virtual library project would approximate
the experience of using the Internet--albeit in a limited fashion.
All documents collected were archived by subject in a succession of
folders and subfolders. The format is best described as a "tree structure,"
the successive folders growing progressively more specific:

The introductory page of the virtual library is modeled on that of
"Dmoz", an Internet
search engine. A bold, underlined general subject title rests above
its component categories in smaller, plain underlined font. All collected
documents were indexed in an Excel table indicating title, author,
the folder the document was archived in, the folders the documents
could be cross-listed in (this column allows for more efficient keyword
searches), the document's source and where to find its copyright information.
Copyright notices were entered either as a web link or stored in the
library in a folder entitled "Copyright Clearances," depending on
whether they were posted on the web or not.
Some of the most important issues encountered during the design of
the project and the collection of materials were both technical and
ethical. Firstly, concerns that shaped the physical construction of
the virtual repository included portability, "searchability" and devising
a method for regular updates. Secondly, during the collection phase,
issues arose relating to authority for the selection of materials
as well as problems of intellectual property rights.
Portability and Searchability
While the Virtual Library Initiative is currently a model and has
not yet been implemented, the project's design has drawn on the progress,
but also on the shortcomings of parallel projects addressing similar
issues. One of the central inspirations for the Carnegie Corporation's
project--eGranary, pioneered by Cliff Missen at the University of
Iowa--does solve the issue of Internet connectivity by locally storing
academic materials on an external, reproducible hard drive. In an
early incarnation, eGranary was not an entirely searchable database;
building on this model, the Corporation's Virtual Library was created
on a server platform which did allow for faster access to more specific
information and improved searchability by subject, keyword and author.
EGranary has also gone to some lengths to improve their functionality,
including recruiting area editors from around the world (currently
they have six volunteers on three continents); they are also in the
process of creating a customized search engine.
The Virtual Library's storage space is constrained by what computer
technology currently offers. The capacity of the hard drives that
will be used in Carnegie Corporation's project store approximately
300G of information. While this does constitute a considerable amount,
it is nowhere near enough to store all digitally available academic
research. Additionally, academic and educational materials are not
limited to scholarly research and should include maps, videos, music
and software (etc.), all of which take up more digital storage space.
Update Methodology
While the potential beneficiary universities of Anglophone Africa
do not lack Internet connectivity entirely, it is often discontinuous
and expensive. Both the incomplete supply of materials and the need
to keep collections current are concerns the Virtual Library program
has anticipated. Several options exist for performing regular updates.
The two most efficient methods involve mirroring software. "Mirroring"
is a process by which client servers housing the virtual library in
Africa would connect to the host computer (containing the most up-to-date
version of the library) via the Internet. Mirroring software in client
computers would "look" into the host computer in order to detect any
new or updated material, then copy and transfer it to the client.
The first update method would involve programming the mirroring software
in client servers to run at regular intervals during hours of optimal
Internet connectivity. The second method would involve mailing updates
on CD-ROMs or DVD-ROMs to the beneficiary universities. Equipped with
mirroring software, the client servers would detect any differences
between the files housed on their hard drives and the files on the
CD or DVD-ROMs and update the server's collection.
Other update methods that do not use mirroring software include simply
shipping an updated version of the virtual library on a new hard drive
that would replace the one in the beneficiary university, which is
one of the current delivery methods for eGranary, which often relies
on volunteers traveling to Africa to deliver a new hard drive to a
university. Physically transporting hard drives is an inefficient
method, and, in general, should be considered only if connectivity
is nearly inexistent or if the client computers in the recipient universities
are not equipped to read CD or DVD-ROMs.
Additionally, individuals with more consistent access to the Internet,
such as professors or librarians, could download individual articles
as needed and store them in the Virtual Library. This does indeed
imply somewhat of an "elitist" project design in the sense that the
university community at large would not have unlimited access to all
documents available on the web.
Unfortunately, until Internet connectivity (either wired or wireless)
becomes more reliable and affordable, a majority of African university
communities may be forced to operate through intermediaries to access
the World Wide Web. Consequently, it is important that local administrators
be shown how to upload compact disks sent as updates, as well as archive
documents they download into the Virtual Library to ensure that the
rest of the university community has access to new resources.
This issue has implications even in an environment in which bandwith
availability begins to increase and price comes down: it will still
be useful and economical to make materials available over a local
area network--for example, that will allow users to share one copy
of an archived document rather than have many users download many
copies and possibly incur multiple costs.
Selection
In both the eGranary and Carnegie Corporation initiatives, the attempt
to anticipate user choice and the collection of corresponding information
on the Internet creates a delicate ethical situation. The need to
store materials locally in a limited physical digital storage space
renders the following questions inevitable: what materials are to
be included and who is empowered to make such choices? The person
collecting materials from the Internet effectively acts as a filter
between the users and what would potentially be available to them
should they have access to the Internet.
Because of its role as a model, Carnegie Corporation's Virtual Library
project has not established a method for regularly consulting professors
and students in the universities where there are libraries it hopes
to enhance. If the model were to be further developed and eventually
implemented as a working project, a partnership between whatever organization
would be responsible for the library and the African scholarly community
the project is meant to serve would make the virtual library most
effective and efficient in meeting the needs of institutions of higher
learning. Further details on this topic will appear in the final section
of this report under "Recommendations."
Copyright and Intellectual Property
A wealth of information is readily available for download on the Internet.
In contrast, accompanying legal notices explaining rights to reproduce
materials for distribution, albeit for educational purposes, are not
as easily found. While some sites do provide links to copyright information
regarding the use of the materials posted, others simply provide an
electronic address to contact for questions regarding legal issues.
In developing this model library some confusion arose, especially
with regard to authors' retention of copyright on work published in
scholarly journals or in newspapers and subsequently posted on the
Internet. In fact, the 2001 Supreme Court ruling of Tasini v. The
New York Times1 declared that publishers do not retain
rights to work they have printed once, and must reach a separate agreement
with authors should the publishers desire to archive their printed
material electronically.
Yet another element of uncertainty raised by the implications of Title
17 of United States Copyright Law:
§ 107. Limitations on exclusive rights: Fair use
38 Notwithstanding the provisions of sections 106 and 106A, the
fair use of a copyrighted work, including such use by reproduction
in copies or phonorecords or by any other means specified by that
section, for purposes such as criticism, comment, news reporting,
teaching (including multiple copies for classroom use), scholarship,
or research, is not an infringement of copyright. In determining
whether the use made of a work in any particular case is a fair
use the factors to be considered shall include-- (1) the purpose
and character of the use, including whether such use is of a commercial
nature or is for nonprofit educational purposes; (2) the nature
of the copyrighted work; (3) the amount and substantiality of the
portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value
of the copyrighted work.
It doesn't appear as though documents stored on the virtual library
would infringe fair use provisions as stated above. Yet there is still
a question as to whether the virtual library ultimately amounts to
"multiple copies for classroom use," since use wouldn't necessarily
be confined to a classroom. It is also important to note that no consistent
legal standard has been established for what constitutes "fair use."
Cases are litigated on an individual basis and courts continually
warn that their decisions should not be interpreted as developing
a standard. This legal text also fails to clearly specify official
provisions for digital, rather than print copies. In addition, the
following section of the Code would only permit a limited number of
copies of copyrighted material to be archived in a library-virtual
or not:
§ 108. Limitations on exclusive rights: Reproduction
by libraries and archives39 (a) Except as otherwise provided in
this title and notwithstanding the provisions of section 106, it
is not an infringement of copyright for a library or archives, or
any of its employees acting within the scope of their employment,
to reproduce no more than one copy or phonorecord of a work,
except as provided in subsections (b) and (c), or to distribute
such copy or phonorecord, under the conditions specified by this
section, if-- (1) the reproduction or distribution is made without
any purpose of direct or indirect commercial advantage; (2) the
collections of the library or archives are (i) open to the public,
or (ii) available not only to researchers affiliated with the library
or archives or with the institution of which it is a part, but also
to other persons doing research in a specialized field; and (3)
the reproduction or distribution of the work includes a notice of
copyright that appears on the copy or phonorecord that is reproduced
under the provisions of this section, or includes a legend stating
that the work may be protected by copyright if no such notice can
be found on the copy or phonorecord that is reproduced under the
provisions of this section. (b) The rights of reproduction and
distribution under this section apply to three copies or phonorecords
of an unpublished work duplicated solely for purposes of preservation
and security or for deposit for research use in another library
or archives of the type described by clause (2) of subsection
(a), if-- (1) the copy or phonorecord reproduced is currently in
the collections of the library or archives; and (2) any such
copy or phonorecord that is reproduced in digital format is not
otherwise distributed in that format and is not made available to
the public in that format outside the premises of the library or
archives2.
While Carnegie Corporation's Virtual Library is still a model, if
it is implemented in some form, surely more copies of the hard drive
will be made than the above statement permits. For the purpose of
this project, when we were in doubt, letters of inquiry were sent
to copyright holders for clarification. Over the course of two months,
out of ten letters sent to request permission to redistribute articles
posted on websites or in web-based journals (for which copyright rules
were unclear), only three received a positive response, and only six
received any response at all. Only few of these letters needed to
be sent out due to the existence of numerous initiatives offering
free access to published scholarly research.
Parallel Projects Many of the sites used for the collection
of materials were those of projects undertaken specifically to share
literature, scholarly research and course materials free both of cost
and copyright restrictions. These projects include:
- Project Gutenberg,
begun in 1971 by Michael Hart at the University of Illinois. The
project makes electronically transcribed literary "classics" available
at no charge on the Internet in "Plain Vanilla ASCII" format3.
This computer language makes Project Gutenberg E-texts intelligible
by the vast majority of computer operating systems, assuring largely
unfettered access to a majority of Internet users. The project
depends on volunteers who electronically transcribe documents
that have entered the public domain (lost their exclusive copyright).
One drawback of relying on public domain materials is pointed
out by Hart who notes, in his 1992 description of the project's
mission, that "the time before a copyrighted work entered the
public domain was extended from 28 [...] to 50 years more than
the life of the author4" making it impossible to keep
Project Gutenberg E-texts anywhere near current. The format Project
Gutenberg E-texts are in allows for easy editing, which Hart encourages.
Although this diminishes any possibility of maintaining "authoritative"
versions of transcribed works, it is a way of incorporating valued
reader feedback
- MIT Open Courseware
Initiative publishes MIT undergraduate and graduate course
materials electronically (lecture notes, graphs, links, problem
sets and listings for additional resources). The database is easily
searchable by academic department, topic, class name, term and
instructor name. All materials are free of cost and governed by
the Creative Commons License5, which allows for the
open reproduction and redistribution of posted materials, provided
they are attributed to MIT OCW. While the use of the materials
on the Open Courseware site cannot be credited towards a class
or a degree, the initiative does offers access to academic materials
from one of the world's most prestigious universities. In order
to continuously improve their service to users worldwide, the
Open Courseware initiative enthusiastically encourages feedback:
a link to a feedback form appears on every webpage.
- The
Avalon Project site at Yale makes texts of key historical
documents available on the Internet at no cost. Subjects covered
remain within the social sciences. The Avalon Project's archives
can be searched by event, century, alphabetical order or geographical
area. By promoting the reproduction of the posted documents and
dynamically linking its site to other, sister projects, the Avalon
project hopes to effectively and efficiently disseminate knowledge.
- The Directory
of Open Access Journals was founded at the First Nordic Conference
on Scholarly Communication at the University of Lund and Copenhagen
in October of 20026 in response to the multiplication
of freely available scholarly e-journals. The Directory of Open
Access journals is a comprehensive listing of all Open Access
journals and is searchable by journal title or subject.
The Open Access Initiative, funded in large part by the Open Society
Institute, was launched after a conference in Budapest aimed at
strengthening partnerships between various open access journals
worldwide. Open access journals are defined as "journals that
use a funding model that does not charge readers or their institutions
for access [...] we take the right of 'users to read, download,
copy distribute, print, search or link to the full texts of these
articles' as mandatory."7 In order to ensure the high
quality of materials posted on DOAJ, included journals must have
an editorial board and a peer review system.
The Open Society Institute also funds a number of smaller, related
projects which include advocacy initiatives, economic research,
the creation of new document repositories for academic institutions,
a series of guides for the conversion of subscription-based journals
to an open access model, and software for storing documents requested
from subscription based journals8. For example, software
such as LOCKSS (Lots Of Copies Keeps Stuff Safe), a "Permanent
Web Publishing And Access System" is a computer program designed
to store copies of documents requested by users on terminals of
institutions subscribed to private, electronic publishers9.
The software permanently stores requested documents at the subscribing
institution, which means there is no need to repeatedly request
the same document, thus expanding library collections by saving
documents locally.
Subscription based journal systems such as JSTOR have yielded interesting
information in the effort to develop the content of Carnegie Corporation's
Virtual Library. While subscriptions are typically very expensive,
JSTOR offers heavily discounted rates to institutions in the developing
world10. While the fee reduction is very generous, it is
of little help in countries where Internet connectivity is prohibitively
expensive and intermittent, or bandwidth is low. A JSTOR11
representative contacted during the course of creating the Corporation's
Virtual Library model pointed out the difficulty of locally storing
JSTOR articles. A nonprofit organization addressing similar connectivity
issues in Central Asia had previously approached him with the same
request. The problem lay not in JSTOR's willingness to facilitate
access to its information by reducing its subscription prices, but
in the physical storage capacity of currently available technology.
JSTOR archives are ".tif" files meaning that each page of an article
is saved as an image, rather than text. Image files being significantly
bigger than text files, it might be difficult to store a meaningful
portion of a JSTOR collection in an average local repository. The
JSTOR representative also points out that there is an even more critical
problem facing projects such as JSTOR, which involves ensuring the
security of the publications they make available in a way that satisfies
the publishers and/or other copyright holders that their intellectual
property will be protected from illegal distribution, duplication,
and data corruption.
- PloS (Public
Library of Science) is governed by the Creative Commons Attribution
License, as is MIT Open Courseware. Its mission is to make high
quality scientific research available freely to the international
life sciences community.
- BioMed
Central functions essentially as does the Directory of Open
Access Journals (DOAJ) described above. BioMed Central provides
a consolidated listing of all open access journals in medicine
and biology while DOAJ lists all Open Access Journals irrespective
of their subject.
- www.Marxist.org
offers full digital transcriptions of the works of political and
economic theorists, sociologists and historians. Despite its name,
the site does not limit itself exclusively to the writings of
the extreme left wing.
These websites were of inestimable value in acquiring material for
virtual library. Most were very easy to use and provided high quality
documentation free of cost and copyright as their project mission
stated. Yet even if the materials are free and, as such, facilitate
access to knowledge, a significant limitation of all the aforementioned
initiatives is that they rely entirely on Internet connectivity, effectively
excluding large portions of the developing world's university communities.
Additional Sources
Other sources for copyright free documentation included many non-governmental
organizations: the United Nations and its subsidiary agencies and
protocols--the UNDP (United Nations Development Programme, WHO (World
Health Organization), FAO (Food and Agriculture Organization) and
UNFCC (The United Nations Framework Convention on Climate Change)--the
United States Institute
for Peace and the World
Policy Institute.
U.S federal government offices, foundations and agencies also provided
valuable materials (however, the digital national defense archive
is, surprisingly, privately owned by ProQuest Information and Learning
Company and a subscription to the archive is needed for viewing).
These included the Government Printing Office (GPO), and the National
Archives and Records Administration, the National
Science Foundation (NSF) and the Center
for Disease Control and Prevention among others. Carnegie Corporation
was also invited to include material from the Open Mind Television
Project which has digitized many programs from the series.
Recommendations
As noted earlier, Carnegie Corporation's Virtual Library Initiative
is intended as a model. Following are some suggestions for the improvement
of the project design, should the initiative be adopted by others
in the future.
Dynamic Dialogue with the Designated Beneficiaries
In order to effectively and efficiently enhance the capacity of the
African universities that would access the Virtual Library, it is
necessary to incorporate a method for the systematic consultation
of scholars, librarians and students in the beneficiary universities.
While the Virtual Library project design does anticipate that a "select
few" would need to use the Internet to download any omitted articles
(and it would be naive to assume the initiative could allow beneficiaries
to avoid Internet use altogether), the project's aim is to remedy
reliance on defective Internet connectivity, whether in a wired or
wireless environment. To minimize the occurrence of missing or redundant
documents, potential beneficiaries should be given the means to make
requests for the initial collection phase, or to inform subsequent
updates. This feedback mechanism could simply take the form of a paper
list mailed on a regular basis to those maintaining the library, or
a "Requests" folder could be incorporated into the library itself
to be filled in by users and later sent to the library managers via
e-mail or post mail.
As only one researcher collected documents for this model, it was
impossible to read through all the documents collected to ensure high
quality and accuracy. A regular feedback mechanism would also serve
as a peer review, especially in the initial stages of implementation.
Recording Use
The virtual library could record the number of "visitors" to specific
folders and subfolders. A log of user activity could be collected
at regular intervals and could be utilized to focus document collection
in given subjects.
Tutorials for Use
Tutorials for best use of the library would ensure that beneficiaries
get the most out of the library. The tutorial could also include instructions
for African scholars to upload their own work and share it with larger
audiences.
Building Partnerships
The list of parallel projects above is encouraging not only in its
length, but also in the variety of initiatives it contains. The state
of Open Access publishing, along with projects such as eGranary and
Gutenberg reflects that of humanitarian initiatives worldwide. While
their numbers are multiplying, their potential for positive impact
is diminished by the lack of coordination between them. The weakness
inherent in fragmentation is easily overcome by encouraging partnerships
among the projects listed above and others that may be underway or
under consideration; creating links to each other's materials (a link
to the eGranary hard drive will feature in the Virtual Library) and
sharing information about new initiatives and ideas.
Conclusion
Should a virtual library project be undertaken by an organization
with the staff, financial resources and ability to make a long-term
investment in the project's management and continual updating, it
would nearly fully recreate a web experience for hundreds, or hopefully,
thousands of students and professors in Anglophone Africa. With scholarly
research increasingly published digitally, limited Internet connectivity
necessarily restricts access to current research and compounds the
situation of marginality African faces today. The virtual library
would, however incompletely, bridge important gaps in access to web-based
academic materials, and would also give increased exposure to the
work of African scholars themselves. Crucial to this undertaking,
however, would be a source of funding and staff to pay for and manage
the licensing that would be required to include the ever-increasing
wealth of materials protected by international copyright laws. Also
crucial would be instituting a mechanism to consult the various scholarly
communities that the project aims to help. A relationship built on
trust and dialogue with beneficiaries as well as other humanitarian
organizations is central to achieving the project's praiseworthy mission.
About the author: Tamara Kummer was born in New York City to a
Franco-Tunisian mother and an American father. In New York City, she
attended both the United Nations International School and the Lycee
Francais, and later spent two years at the Lycee Montaigne in Paris.
Tamara graduated magna cum laude from Boston University in May of
2004 as a political science major with minors in Spanish and African
studies. In her last year of undergraduate study, she became a member
of Pi Sigma Alpha, the national political science honor society. During
the summer of 2003, Tamara interned at the World Food Programme in
Rome with the Vulnerability Analysis and Mapping (VAM) unit. The following
summer, she interned at the Carnegie Corporation of New York working
on a model "Virtual Library" project designed to enhance the library
collections of African universities where Internet connectivity is
deficient. Tamara will spend the 2004-2005 school year at the London
School of Economics and Political Science pursuing a Master's degree
in Comparative Politics. Footnotes
- Supreme
Court of the United States New York Times Co. Inc., et al. v.
Tasini et Al. No. 00-201 June 25, 2001
- Circular
92, Copyright Law of the United States of America and Related
Laws Contained in Title 17 of the United States Code pp.26-29
June 2003
- Hart, Michael S. "The
History and Philosophy Behind the Project Gutenberg" August,
1992
- Hart, Michael S. "The
History and Philosophy Behind the Project Gutenberg" August,
1992
- Massachusetts
Institute of Technology, 2003
- Lund
University Libraries First Nordic Conference on Scholarly Communications
Lund 22-23 October 2002, Copenhagen 24 October 2002
- Lund
University Libraries Directory of Open Access Journals Definitions,
2004
- for a complete listing of Open Access related projects funded
by the Open Society Institute, please visit http://www.soros.org/openaccess/grants-awarded.shtml
- Reich, Vicky and Rosenthal, David S. H. "LOCKSS:
A Permanent Web Publishing and Access System" D-Lib Magazine
Vol. 7, no 6 June 2001
- Similarly, the eGranary project has obtained license to copy
the World Book Encyclopedia onto their hard drives. However, World
Book only allows for the free use of its encyclopedia to eGranary
users in the very poorest nations of the world. Other, wealthier
regions must pay a modest fee.
- JSTOR, which was created and funded by the Andrew W. Mellon
Foundation, also receives support from Carnegie Corporation of
New York
|