Each year the British Academy funds a significant
number of projects which result in the creation of arts or humanities
data sets. Since 1992 it has required applicants for grants to provide
evidence that they have made adequate provision for the long-term access
and preservation of the resources generated by the project.
The British Library Research and Development Department
funds several projects every year, some of which result in the creation
of relevant data sets, though these often are small and with limited
scope for re-use (some are specifically precluded from widespread use
due to copyright restrictions). It typically has ``encouraged''
creators of data sets to deposit them at the ESRC Data Archive
(see 8.1.3.4 Source of funding:), without making this a condition of grant;
not all data sets so created have been deposited. The Library is
considering a policy which requires deposit as a condition of grant.
The Getty Art History Information Program has funded
two projects in the UK explicitly concerned with the creation of data
sets. [See note 5] In both cases, the
projects were required to deposit copies with the Institute (in
California), but not elsewhere.
The Leverhulme Trust funds about 50 humanities
projects each year, many of which include data set creation. It does
not impose conditions concerning access, deposit or preservation.
The Economic and Social Sciences Research Council (
There is an expectation that data set provision can be made
economically viable and self-sustaining in the long term. This may be
true for some kinds of resource where the demand is clear and
sustained; it is less self-evident in the general case, particularly
for the wide range of materials likely to be of interest to the
Humanities.
The provision of services is varied and and uneven. There are two
formally constituted data archives --- the History Data Unit
of the
Recently the British Academy carried out a Networked
Data Information Survey of all Arts and Humanities departments
in the UK. This suggested that there is a demand for commercial data
sets, many of which are too expensive for smaller institutions to
obtain individually. [See note 8]
While it is in any single publisher's interest to maintain archival
copies of its materials, there is no commercial imperative to do so.
On the contrary, it is easily possible to imagine commercial pressures
which would result in data sets for which there is little demand being
discarded. We are not aware of any unified source or catalogue for
these materials; nor are we aware of any widely-used standard for works
published in electronic form.
Within the UK the academic communications infrastructure ---
JANET and SuperJANET --- has largely, up to now, been centrally
funded by the HEFCs. The funding councils are advised by an
Advisory Council on Networking (
There have also been a number of recent policy reviews, including
the SuperJANET Project on Information Resources (
A number of projects and groups are active in developing network
provision and access. Notable among these are the UK Office for
Library and Information Networking (
Alongside this the rapid and chaotic growth of the Internet has to
be taken into account, comprising as it does a certain number of stable
well-ordered data sets and services and very large volume of
unregulated activity and information. This presents problems of its
own; for it has become difficult to distinguish useful high-quality
information from the low-grade material available side by side. (``On
the one hand everything is available. But on the other hand,
everything is available.'' - Alfred Glossbrenner). The
difficulty of locating appropriate resources has become a common
complaint.
Following on from the above, an increasingly important issue is
whether the resource is to be accessed and manipulated on-line. If so,
consideration has to be given to which access tools users will be able
to employ. Among the most common network access mechanisms are
Gopher, World Wide Web (
Another approach has been to develop interfaces between gateways of
different types. The Common Gateway Interface (
2.1 Background
2.1.1 Data sets and services today
In some disciplines, and in some countries, deposit of digital data
sets in ``approved'' archives is a common and accepted condition of
grant. This is frequently not the case in humanities disciplines in
the United Kingdom, as a brief informal survey of four funding
institutions has shown. 2.1.2 Commercial data sets
In addition to the academic projects and institutions, several
commercial organizations are now becoming involved in producing and
publishing digital documents for scholarly use. Frequently, these are
traditional paper-based publishers which are diversifying into
electronic media. Their products range from novels sold on diskette
for a few pounds to corpora sold on CD-ROM for tens of thousands of
pounds. [See note 7] 2.1.3 Resources for teaching
There have been three major Research Council-funded UK initiatives
in the past few years, all aimed at promoting the use of electronic
resources in teaching, and at creating a range of transferable
materials. First was the Computers in Teaching Initiative
(2.1.4 Networked Information in the UK
The creation and management of networked electronic resources is
currently a major area of activity within the Higher Education
community, with a number of bodies and institutions involved, and a
number of projects and initiatives. 2.1.5 The international dimension
Although, for practical reasons, the focus in our discussion is
largely within the UK, there are many relevant institutions and
projects outside the UK. There has always been a strong international
tradition among scholars, which has been greatly facilitated by the
ease with which students and researchers can now cross international
frontiers electronically in their pursuit of digital resources. Indeed,
there is a clearly emerging tendency for a small number of academic
experts working within a comparatively narrow specialization but spread
across several countries, to gain
``critical mass'' by working together using the Internet, thereby
achieving results greater than the sum of the results they might
achieve individually. 2.2 Needs
The following sections present a number of specific issues in the
provision of networked information. They formed a major component of
the
Within this framework, we identify the specific
needs of the Arts and Humanities community, and try to
summarize the current situation in each case.
2.3 Resource Creation
The resources covered here include:
All these resource types are covered within the 2.3.1 Academic quality.
2.3.1.1 Need
Peer review procedures are needed which provide external validation
for 2.3.1.2 Current Situation
Funding bodies such as the British Academy and the
Leverhulme Trust carry out refereeing procedures when bids
are assessed for projects which will create electronic data sets, as do
a number of HE institutions when allocating funds internally for such
projects. However, there are large numbers of data sets which are
conceived and initiated without any such procedure. Even more
important, perhaps, peer review of completed projects is
relatively rare. It may be that such procedures should be external to
the 2.3.2 Copyright
2.3.2.1 Need
First, it is important that copyright issues are addressed at a very
early stage in the planning of individual projects. More generally,
agreements and guidelines are needed at HE community level to provide a
coherent and well-understood framework within which resource creation
projects can operate.2.3.2.2 Current Situation
There is discussion going on amongst publishers, and between
publishers and HE authors and institutions. The British Library
has a working party on Legal Deposit of electronic material which is of
relevance here. The development of ``a model
agreement with publishers'' is being addressed by FIGIT. [See note 14] The 2.3.3 Design standards
2.3.3.1 Need
There is a need for more widespread awareness and use of good
design practice, both for research data sets and for teaching modules.2.3.3.2 Current Situation
A number of data analysis and design methodologies are available in
the commercial world which can be brought to bear on the development of
research data sets. [See note 17]
There is also now an extensive literature concerning the design of
Computer-Based Learning (2.3.4 Re-usability and access
2.3.4.1 Need
Platform-independent standards for data encoding are essential to
ensure transferability and re-usability of data. Such re-usability has
academic benefits in terms of increasing the resources available for
research and teaching, of allowing arguments and conclusions to be
challenged by manipulation of a particular data set, and also, where
copyright conditions of use permit it, of enabling resources to be ``re-purposed''
--- i.e. modified or adapted to serve a new teaching or research
purpose. Re-usability also has economic importance: high-quality
electronic resources are expensive to produce, and it is important to
ensure that as far as possible their re-use is a matter of academic
judgment rather than of technical or financial constraint.2.3.4.2 Current Situation
Currently platform-independence remains a goal rather than a
reality for the most part. However, data encoding is an area of great
activity and promise for all data types, with the work of the
Text Encoding Initiative (2.3.5 Resource identification
This is the first in a range of documentation which is required if
electronic resources are to be both accessible and usable, and which is
best produced by the resource creator.2.3.5.1 Need
Standards are urgently required which enable resources in the
rapidly growing Internet world to be uniquely identified and located.
The need is far from simple, as for example in identifying resources
which are replicated in a number of places. 2.3.5.2 Current Situation
The Internet Engineering Task Force (2.3.6 Resource description
2.3.6.1 Need
Standards are needed for the electronic equivalent of full
catalogue entries, to enable prospective users to make a judgment on
the suitability of the resource to meet their needs without having to
retrieve the whole data set. A considerable amount of technical
information is needed, beyond the standard catalogue entries, from the
details of the encoding scheme used to the software and hardware
required in order to manipulate the resource. 2.3.6.2 Current Situation
Within the Library community there has been a great deal of work on
resource description, mostly involving variations on the MARC standard,
e.g. USMARC, UKMARC, UNIMARC. The ``TEI Header'' provides another
possible approach, and the TEI Guidelines describe how
the two approaches may be linked. [See note 21]2.3.7 Technical description
2.3.7.1 Need
Standards are required so that all the technical information needed
to understand and manipulate the data set is available. Documentation
of this type is particularly important for certain types of data set,
e.g. for complex relational databases where schemas showing the tables
and their relationships are essential. It is also essential whenever
proprietary software or hardware is used.2.3.7.2 Current Situation
It is far from clear that technical descriptions are produced as a
matter of course along with every new data set. A number of system
design methodologies, e.g. 2.3.8 User Guidelines
2.3.8.1 Need
Guidelines are needed to ensure first that users have available the
necessary information on how to use the resource, and second that the
way in which this information is presented is consistent from one data
set to another.2.3.8.2 Current Situation
It is widely accepted that while a paper version may be appropriate
in some circumstances, help for users of on-line resources must be
provided on-line. There are no standards as yet for the structure and
presentation of such material, but hypertext approaches - often using
2.4 Resource Management
2.4.1 Basic procedures
2.4.1.1 Need
Policies, standards and procedures are needed for accessioning data
sets, documenting them where necessary, adding entries to the local
catalogue(s) and indexes, and where appropriate to central catalogues.2.4.1.2 Current Situation
All the established data archives have such procedures as a matter
of course (as, of course, do all libraries). 2.4.2 Information content
2.4.2.1 Need
Procedures are needed to assure the integrity of the resource
against accidental or deliberate corruption. Where a resource is
subject to amendment, version control procedures are
required to ensure that users know which version of the resource they
are using, and which versions are available for use. 2.4.2.2 Current Situation
Version control procedures are well established in existing data
archives and university computing services. The
TEI Guidelines include detailed proposals for embedding
version control information within an electronic resource. 2.4.3 Intellectual history
2.4.3.1 Need
Documentation procedures are required which will allow the usage of
the resource to be recorded, along with comments by the users, in
particular any reports of ``errors'' in the data. Clearly it would
be improper for an archive to ``correct'' such errors silently, but
equally it has a duty to report them to future prospective users. Over
time, networked resources will inevitably accrue critical commentary
(often in electronic form) in exactly the same way as printed
resources. It would be desirable for at least the bibliography for
this commentary to be included with the documentation for the data set.
2.4.3.2 Current Situation
It seems that most data archives, e.g. the ESRC Data Archive
(2.4.4 Physical access
2.4.4.1 Need
Procedures are needed which ensure that there is no unauthorized
access to networked resources. 2.4.4.2 Current Situation
For some data sets and services there is no restriction on access.
Where access has to be restricted, electronic
``registration'' is widely used, typically involving the allocation
to a user of a ``user identifier'' and
``password''. Where necessary, passwords --- and even complete
data sets --- may be encrypted.
Usually, registration will entitle
the user to the basic services offered by the institution. However,
additional procedures are required where particular conditions are
attached to the use of specific data sets, as described below. 2.4.5 Conditions of use
2.4.5.1 Need
Procedures are needed which ensure that conditions of use imposed
by creators, copyright holders, and the institution managing the
resource are agreed and respected by users. 2.4.5.2 Current Situation
Typically, as part of the registration process, the prospective
user signs a ``Conditions of Use'' form in relation to the archive
generally. However, further conditions may be imposed in relation to
specific data sets. 2.4.6 Intellectual Property Rights (IPR)
2.4.6.1 Need
Procedures are needed which ensure that the rights of authors and
other rights-holders are protected, as far as is within the power of
the service. (See 4.4.2 Intellectual Property Rights.) 2.4.6.2 Current Situation
This can be a complex area. At present it is covered typically
within the registration and ``conditions of use'' procedures, with
the user undertaking to cite all materials used for further publication
in the appropriate manner. 2.4.7 Charging
2.4.7.1 Need
Procedures are needed which can cope, potentially, with a variety
of charging mechanisms. (See 4.4.1 Charging and licensing.) 2.4.7.2 Current Situation
There are few electronic charging mechanisms in current use.
Typically charges are levied as part of the registration or
``conditions of use'' procedures. 2.4.8 Platform-specific data
2.4.8.1 Need
Until such time as platform independence is uniformly enforceable,
individual Service Providers will need to be able to manage data sets
produced in a number of formats, and to handle both the variety of
software common among their constituency of creators and users, and the
hardware on which it runs. It is desirable if conversion between
formats can be carried out where necessary. It would also be desirable
for old non-conformant data sets to be
``upgraded'' to comply with defined standards, particularly where
they are likely to be widely used. 2.4.8.2 Current Situation
Typically, existing data archives accept and manage data sets in a
variety of formats. In certain circumstances data conversion may be
carried out at the request of a user. The Oxford Text Archive
has undertaken a phased programme of marking up its non-TEI conformant
texts in 2.5 Resource Preservation
This is an area of particular importance to the funding bodies, who
are anxious to ensure that expensively created resources will remain
available for use after the project funding runs out. 2.5.1 Preservation procedures
2.5.1.1 Need
Procedures are needed which can ensure the preservation of
electronic resources indefinitely, until a decision is taken to destroy
them or to allow them to lapse. Associated documentation procedures
are also needed, e.g. to record the versions and formats in which a
resource is held. 2.5.1.2 Current Situation
For all too many data sets there are no measures to ensure their
preservation. For others, advantage is taken of the routine back-up
and archive facilities offered by the local computing service. Where
data is deposited with one of the established archives, there are
usually more elaborate procedures in operation, with data sets held in
several formats, and new copies made periodically. 2.5.2 Preservation media
2.5.2.1 Need
Effective preservation procedures need to take account of the
likely longevity of the media on which the data sets are to be stored,
both in terms of physical degeneration and of the availability of the
necessary hardware and software to decode the data. 2.5.2.2 Current Situation
There is considerable activity in this area among the Archive
community. Until recently, the major preservation medium was magnetic
tape, with new copies made periodically to protect against physical
deterioration. Optical disc technologies are now increasingly widely
used, but there are no universally accepted standards. The Council
of European Social Sciences Data Archives (2.6 Resource Dissemination and Discovery
``Discovery'' and ``Dissemination'' are very closely
related. The emphasis in ``Dissemination'' is on the steps taken
by resource publishers to make materials available and usable; in
``Discovery'' it is on the range of tools needed by users to access
and manipulate the resources. However, the latter depends very
directly on the former. 2.6.1 Delivery platforms
2.6.1.1 Need
For the foreseeable future it is likely that a variety of delivery
methods will be required. Guidelines are needed covering dissemination
of resources in parallel: over the networks, on other electronic media
such as CD-ROM, and on paper. Authors and funders will be reluctant to
limit themselves to any single delivery platform while the capabilities
of users and their equipment are so varied. 2.6.1.2 Current Situation
Mixed platform delivery is well established in existing archives,
where data sets are available over the Internet, on CD-ROM, on floppy
discs, and even on magnetic tape if required. Commercial publishers
are increasingly involved, at least in parallel publishing on paper and
CD-ROM; a number are also very interested in publication over the
Internet. [See note 23]
FIGIT Programme Areas 1 - Electronic Document and Article Delivery and
4 - On-Demand Publishing are particularly significant in this context. [See note 24]2.6.2 Communications infrastructure
2.6.2.1 Need
The networking of resources is possible only if a suitable
communications infrastructure is in place, and valuable only if
institutional infrastructures provide access to the resources for a
critical mass of researchers, teachers, and students. 2.6.2.2 Current Situation
SuperJANET will provide a nation-wide high-speed,
high-bandwidth platform, with links to other national and international
networks, and capable of carrying the whole range of arts and
humanities data types, including high resolution still and moving
colour images. There are a number of SuperJANET demonstrator projects,
including some from the Arts and Humanities, such as the
Remote Access to Museums and Archives (2.6.3 Catalogues and Indexes
2.6.3.1 Need
The creation of comprehensive metadata resources is
crucial if the networks are to be a successful method of dissemination.
This involves not only on-line catalogues of resources, and catalogues
of catalogues, but also searchable indexes of various types, including
by subject, by author and by data type. 2.6.3.2 Current Situation
This has long been an area of considerable activity in the library
community, and On-line Public Access Catalogues are well
established. Many existing data archives have on-line catalogues and
indexes. On a larger scale, in the US, the Research Libraries
Group's Research Libraries Information Network (2.6.4 Gateways
2.6.4.1 Need
Gateways are needed to provide simplified access to resources,
saving users a great deal of time and the effort. Busy academics need
``one stop shops'' which will provide access to all the resources
they require, using a single interface or at worst a small number of
consistent interfaces. 2.6.4.2 Current Situation
There is a great deal of activity in this area.
One approach, originally intended for Gopher-based services
only, has been to develop a National Entry Point (2.6.5 Service reliability
2.6.5.1 Need
If networked resources are to be utilized to their full potential,
it is important that they form part of a well-supported service rather
than being just a data set. Further, unless users can rely on the
resources to be available, the services will soon lose their
credibility. 2.6.5.2 Current Situation
A great many of the resources currently available on the Internet
cannot be relied upon. Many exist only on an individual's personal
computer, and are therefore unavailable if that machine is switched off
or if it has a hardware, software or communications fault. Among the
established data archives, the situation is a great deal better, of
course. In many cases the file servers are mirrored, so that a fault
with one does not bring the service down. It is also common to design
communications systems with a degree of redundancy. Another benefit of
redundancy can be performance. The DANTE InfoFlow
project is based on a design of interlinked (national) data servers,
with data replicated on two or more of these, to avoid access
bottlenecks as well as to ensure a stable and reliable service (see
Section 8.2.5.4 Source of funding ). 2.6.6 Documentation, training and support for users
2.6.6.1 Need
These are crucial if a service is to be provided rather than just a
data set. Standards are needed to define the form which the
documentation, training and support are to take.2.6.6.2 Current Situation
Without exception, the Data Archives we contacted all stressed
documentation, training and user support as being a particularly
important aspect of the service they provide. Publicity material
includes leaflets sent to Higher Education institutions or distributed
at conferences. Other documentation may include bulletins and
newsletters, made available on paper and on- line. Training includes
self-help guides and scheduled courses and workshops, offered in-house,
and at conferences and in HE institutions. Other aspects of user
support are typically centred on a help-desk, but increasingly include
electronic activities, such as an email query service, an electronic
bulletin board, and electronic discussion groups open to all registered
users. This group of activities is included in FIGIT Programme Area 5
- Training and Awareness. [See note 28] 2.6.7 Resource discovery tools
2.6.7.1 Need
Standard tools are needed for:
2.6.7.2 Current Situation
The current work on resource location (2.6.8 Documentation and training in resource
discovery
2.6.8.1 Need
Resource packs are needed which give new users basic guidance on
what tools are available, and how they can be used to explore the world
of electronic resources. On-line documentation also has an important
role to play, to provide help once the user has succeeded in making a
connection to the network. 2.6.8.2 Current Situation
There are a large number of books about the Internet, how to get
started, and how to navigate around Gopher-space and the World
Wide Web. The
Back to table of contents
On to next section
Back to previous section