Digital archiving and publication
Introduction and rationale
Publishing scholarly material digitally
While there has always been a considerable gap between the
high-ranking educational institutions and less privileged ones both in
access to scientific information and their ability to publish
research, the gap has been growing drastically in recent
years.
In a recent article in the Scientific American, the predicament of
imbalance in publication is discussed, detailing the enormous
disproportion in published material in the peer reviewed scientific
press between various countries. The articles cites examples of
material by the same author being readily accepted when submitted from
a continental US institution and refused when it originated from the
home university which was located in an emerging nation. It is also
well documented that the overwhelming amount of published material
originates and is published in North America and Europe. Several
explanations are offered, among them the obvious one of density of
institutions, publications and the amount of research being done. An
interesting reason however, is the ease by which material is now
transmitted for review in digital form and increasingly published in
final and permanent form on the Internet itself. In fact, publication
of scientific material on the Internet's World-Wide Web is fast
becoming the medium of choice. Edited and reviewed sites are
proliferating and the web is set to surpass the printed media volume
of scholarly publication if it has not already done so.
Availability of scientific literature
Even more important than the possibility to publish, is the
availability and accessibility of scientific journals and
literature. In the increasingly unfriendly economic climate that vast
numbers of educational institutions all over the world find
themselves, libraries are invariable subjects of cost cuts both in
terms of books and periodicals. There are innumerable examples of not
only research but vital teaching being carried out on the basis of
literature that is both out of date and factually wrong. With
educational financing deteriorating relative to growing demands for
quality education in all but the most privileged academic
environments, the future for substantial improvements in access and
availability of scientific publications in traditional form looks
increasing dismal. To compound the problem, printed materials that are
acquired, are often not procured in sufficient numbers, and for all
practical purposes unavailable to all but the few individuals who have
secured the copies that exist.
Digital archiving and preservation
An increasing number of institutions view intranet and Internet
communication technologies, supplemented by other forms of digital
documentation as a viable way out of their information access
predicament. In this context "digital documentation" is perceived as
encompassing methods for acquiring, archiving and publishing scientific
and administrative information. It thus involves recording analog
information in formats that are as close as possible to optimal for
future retrieval, reuse and publishing. While online publishing is
important, popular and cost-effective both in terms of information
acquisition and publishing, simple archiving to tapes and CD both for
conservation as duplication and backup is a major part of efforts in
methodical documentation.
Sharing knowledge
Of all characteristics that can be attributed to the Internet, open
sharing of information between institutions and individuals is
definitely the most important. Exchange of knowledge and information
was at the very roots of the development of the Internet and in spite
of the enormous growth of commercialization on the net in recent
years, it continues to be the most enduring advantage of the
net. While many of the companies that have invested in the net in
expectation of huge profits in information trade, difficulties in
implementing methods of payment continue to be a problem. Even when
practical methods of remuneration are in common use, it's doubtful if
cultural and scientific information of serious educational use will
remain anything other than the responsibility of the academic
community.
Similar projects
Practical considerations
The following is an overview of some characteristics of digital
documentation:
- Advantages of digital archiving
- Size
As an example 100 images of large poster in 4 resolutions, the
largest of print quality, can be stored on a standard digital
compact disk.
- Retrieval and reuse
Converting information to digital form ensures that each copy
retrieved is exactly identical to the original recording. Thus
material properly cataloged and stored on read-only digital media
ensures unchanged copies.
Cataloging of the material is also digital, providing mediate
access.
It can also be sent digitally as copy to any location on the the
Internet or on hard media.
- Ease of reproduction
Apart from reproduction in an infinite number as digital copies,
digitized material would be able available for use in print or
other media that use digital production techniques.
- Permanence and safety in duplication
Digital archiving provides the advantage of making duplicates on
various media and storage in separate location, ensuring permanent,
destruction-free storage.
- Digital publication
- Universal availability
Digitized information can be made universally available on the
Internet (with limited personal access if necessary) and on various
other readily available and common media such as CDs, digital
tapes and disks.
- Low costs
Publication costs are low, usually involving only the authoring
or editorial processes and formatting of mater documents.
- Ease of publication
With the emergence of standard cross-platform formats of common
data types such as html for digital publication and sgml for
print, material can often be prepared for final publication
by archivists and authors themselves. Additional work is
usually limited to editorial and design modification for
adaptation into larger collections such as web sites or
for marketing and packaging purposes.
- Search-ability
Digital formats provide excellent search functionality, especially
in text, but increasingly in other data where pattern recognition
is applicable. Provided common, cross-platform standards are used,
searches need not be limited to isolated collections of information.
- Collaboration
By digitizing information and it available universally or
selectively, conditions for collaboration are greatly enhanced
and time/space limitation (and costs) proportionately reduced.
- Immediacy of intellectual property rights
Intellectual property rights (commonly understood as "copyright")
is not limited to particular media. Consequently the limitations
previously imposed by print publications, including loss and theft
between submission and publication are alleviated.
Digital publication, for example on the Internet, ensures the
immediate and permanent intellectual property rights of the
owner of the material involved, provided it otherwise complies
with international copyright regulations.
- Prerequisites
- Infrastructure
Archiving, conversion, transmission and especially publishing
require sufficient infrastructure in the form of basic
network facilities. The situation in 1997 is characterized
by fair to excellent internal infrastructure at many
educational institutions. External interconnectivity remains
a severe limitation on collaboration and publication with
all but the most privileged. The emergence of common cross-
platform standards in both archiving and publication has
seen an explosion of both innovative software and methods
of presentation and transmission, all putting severe demands
and strains on available bandwidth and the dynamics of
academic inter-networks. Unfortunately, there does not
seem to be any reason to expect effective improvements other
than those that can be provided by government and academic
networking organizations that see clearly the benefits for
improvement in education and research.
- Technical resources
Important resource to note are primarily equipment for recording
in standardized formats that can easily be read and transferred
to new standards as these are developed. Also important
is the consideration of standardized, permanent recording media
such as CDs and magnetic tapes.
New file formats are developed continually, especially for
displaying over the Internet. Original recordings should be
made in formats that preserve complete ranges of sound and
color at as high resolutions as possible. Adaptations to
various schemes of compression should be made only where no
significant loss of information is not of importance to
either storage or display for end users.
The success of any archival and publishing venture is dependent
on the skills of the people involved. For non-skilled
administrative personnel, the simplest means of judging
the skills and competence of the staff involved is the measure
of their reliance on proprietary commercial software and
standards. The widest possible access and best conformity for
information sharing with other institutions and low-cost
software is usually achieved by using standards the are
freely available in the public domain. Public domain
software usually developed from research at educational
institutions is freely available on the Internet. Standards
developed along the same lines are open and thus readily
adaptable for use in both public and commercial software, ensuring
much wider compatibility and use at reasonable costs.
- Editorial competence
To ensure that material be digitized in formats and ways that
are compatible with future use, it is important that the venture
has a certain amount of knowledge of formatting and editing
data for publication on various digital media. "Publication"
in this context does not necessarily mean making the digitized
information globally available on the Internet, but simply assembling
the archived material in such a way that it is retrievable in
reasonably ordered form at any time by those who are responsible
for the original analog information.
In addition digital archiving and especially publication does
demand knowledge and skills in organizing material, editing for
reader interest and some basic knowledge of data communication.
This includes the technical skills necessary to run the software
needed for publication both at the client and server ends. Also
important is sufficient experience and knowledge of Internet
communications to be able to judge delivery bandwidths and the
constraints these impose on document formats and sizes with respect
to online publication.
- Equity
Given the various prerequisites above, which are mostly of a
technical nature, the most important principle for success
of any digital archival and publication venture is the doctrine
that information retained by educational institutions is in the
public domain. Knowledge retained by educational institutions
is developed through research and collected by acquisition first
for the benefit of students, teachers and researchers and second,
for the public at large including the use of other institutions
with similar aims.
Attitudes of possessiveness others than those required to protect
personal integrity and intellectual property rights simply
negate the purpose and aims of archival for knowledge retrieval
and dissemination.
-
Digitizing for archiving and digitizing for publication
There is an important point to be made about the difference between
digitizing for archiving purposes alone and subsequent publication
of the same material. Storing information on a computer which is attached
to a network does not mean that it is immediately available on the
global Internet anymore than putting money in a bank makes it accessible
to people on the street outside.
Digitized material can be stored on CDs for example, for safe physical
storage. While ensuring permanence and safety, it does severely restrict
retrievability. The World-Wide Web client-server software system in
combination with configuration techniques in network routing allows for
all the necessary variations in access control familiar to physical
libraries and archives, with some fairly sophisticated additions. Access
to archived digitized information can thus be controlled with respect to
conditions such as location, area, personal identification, authentication,
passwords, and so on.
-
Media shelf-life
An often voiced concern with respect to archiving digitized
material is the potential deterioration of the "permanent" media on
which it is stored or the rapid outdating of the retrieval
technology, both hardware and software. Media self-life is
dependent on several factors not least of which is the market
penetration of common technologies. The more common a certain
piece of technology, the longer it will remain in
use. Ultimately, the availability of digitized information will
simply depend on its perceived value on the part of its
custodians. In a well managed archival insitute, material will
simply be moved from one form of media about to become outdated
to a more modern version as a matter of routine. The fact that
the archived material is digital will ensure that it can be moved
preserving both accuracy and integrity irrespective of the
storage medium. In the case of analog information such as print
and film such movement between storage media is prohibitively
expensive and causes serious deterioration to the quality of the
information itself.
Børre Ludvigsen, professor of information architecture - 970805/970910
Created by the Documentation Center at AUB in collaboration with Al Mashriq of Høgskolen i Østfold,
Norway.
Email: ddc-info@aub.edu.lb
|