In each case, a balance must be found between the need to ensure that resources remain usable over long periods of time and in many different contexts, and the need to adapt rapidly to changing technology (and fashions) in a rapidly developing field. Fortunately, this is a problem by no means unique to arts and humanities data users or providers.
Some basic principles underlining the
The existence of the
Although under development since 1989, the
We anticipate that there will be a considerable start-up cost
associated with conversion of ``legacy data'', but that this cost
will rapidly decrease.
The
The commercial protocol Musical Instrument Digital Interface
(
Standards for data compression are relevant both to image (e.g.
Conventions for the documentation of networked electronic resources
are still in a state of ferment, despite a number of years of
discussion both within the library and information science community
and outside it. It is only a slight exaggeration to say that at present
the quality of information available in ``gopher space'' is
indirectly proportional to its quantity: never before have so many
known so little about so much.
The most promising basis for an improvement in this situation will
be a careful examination of existing standards for bibliographic
description, and their relevance to the description of non-book
materials. Relevant standards include
The identification and approval of any pre-existing descriptive
thesauri appropriate within given subject areas will be of considerable
assistance to subject-specific resource providers. The high cost of
building and maintaining indexes based on such thesauri should be set
against the enormous increase in data accessibility which they
typically facilitate.
This is particularly true for the description of visual images in
verbal or encoded form, without which large scale image databases are
likely to be entirely unmanageable. Natural language descriptive
thesauri (similar to those used for other data sets) are widely used,
for example by the Getty Foundation's projects. For
representational art in the Western European tradition there also exist
abstract descriptive schemes such as Iconclass.
Wide consultation involving creators, users and subject specialists
will be necessary to ensure that data documentation supports both
informed and responsible data usage. The
Most of the areas of concern common to all
It would be premature for the
Service providers may in any case need to adopt different
cost-recovery procedures. Some may need to recover only media costs;
others may need to recover a proportion of the cost of creating and
maintaining the resources within a defined period. In some cases, costs
may be beyond the control of service providers, being determined by
external agencies who make resources available to service providers on
certain terms only.
In the case of commercially-funded resource provision, the existing
expertise of bodies such as
Where resources are prepared specifically for the
Where charges are made, for whatever reason, it is likely that
differential rates may be applied to different users. One common model
is to provide commercial access at a premium rate, royalties from which
can then be used to subsidize academic access. It is not always easy to
distinguish ``academic'' and ``commercial'' institutions or
usage however. Equally, if institutions not funded by the Funding
Councils wish to access the service, it is likely that differential
charging rates would be applicable.
Cost recovery for the service as a whole is greatly complicated by
the fact that very large economies of scale may be obtained by
localization of some key strategic functions such as data preservation
and data description.
We recommend that, as a first step, existing charging policies of
all
It is clearly essential that the
From the end-user's point of view, what is important is a clear
statement of the level of service a given provider can reasonably be
expected to provide. This might include simple ``performance
indicators'' such as time taken to turn round a request for a
resource, or to answer a query, or more imponderable qualitative matters
such as the extent to which the service offered actually improved the
effectiveness of a researcher's work or a teacher's productivity. In
attempting to provide such information, the
It is the users of data sets who are most likely to identify
discrepancies or inconsistencies and they should be encouraged to
report these for the benefit of all. Different service providers may
have different policies as to whose responsibility it should be to act
on such error reports, but we recommend that redistribution of such
information should be a pre-requisite for
Thought should be given to ways of providing
qualitative information about data resources (that is,
some indication of the accuracy or reliability of particular resources)
in ways compatible with the general principles of detailed data
description discussed in section 4.3 Data Documentation . The definition
of acceptable levels of accuracy is an important area for
In general the term archive implies ``for ever'';
all archivists know, however, that policies allowing for some culling
(or removal) of unused or duplicated resources at some time are
necessary. There is no reason to believe that electronic archives will
be any different from others in this respect, except perhaps
quantitatively. To some extent, the existence of properly defined and
agreed quality assurance procedures for the accession of
Unlike the archiving of documents or other artefacts, the archiving
of electronically held information resources requires a separation
between the medium and its content which may not always be easy to
carry out. Hardware changes probably make impossible the indefinite
preservation of software. However, there is ample evidence that data
sets can be preserved for almost indefinite lengths of time simply by
establishing routines for the migration of the data from one carrier
medium to another, without loss of information.
Because this task of archival storage is likely to be delegated in
many cases to specialist agencies, it is correspondingly important for
the
Some aspects are unlikely to change however: these include
considerations such as the number of copies held, their locations, the
nature and degree of integrity checking carried out both across the
resources synchronically and periodically, to check for degradation.
In general, economies of scale argue strongly for provision of
archival facilities as a strategic
4.1 Standards Reference Guide
The
4.2 Data Formats
Standardization is currently well advanced in this domain, with
many competing or overlapping standards in a number of areas.
Fortunately, considerable expertise exists within the academic and
industrial data processing communities with respect to data conversion
and inter-operability of systems. Service providers and other
appropriate groups of experts will be asked to provide short lists of
recommendations, perhaps with differing emphases in different subject
areas, as to the commonly observed standards for data format. 4.2.1 Textual data
For the 4.2.2 Tabular data
For the rectangular data sets useful in certain types of research
in the social and historical sciences, data in delimited or fixed
format columns is widely accepted. Where information has been stored in
databases based on the relational model, the export and transfer of
data is also generally easy, although the transfer of associated
descriptive meta-information may be more difficult. [See note 39] For data sets stored in databases using other
models, for example with multi-valued fields, data export and transfer
generally pose serious problems, because each such database system will
have its own proprietary internal format, and there may be no
conversion or import/export facility. 4.2.3 Graphical data
Many different proprietary formats are used, each of which has its
advocates. There is considerable inter-operability, because of a number
of proprietary converters and because it is common for software vendors
to support more than one format. This is, however, a rapidly changing
and developing field. 4.2.4 Audio and other time-based data
Time-based data sets, both in the form of compressed audio or video
and in symbolic form (such as an encoded representation of a musical
score) will be of increasing importance as equipment capable of managing
them becomes more readily available. At present, handling such data sets
appears to be a highly specialized activity, although audio and video
``clips'' are becoming a normal part of the material presented in
on-line services such as the World Wide Web, and have
obvious academic uses. 4.3 Data Documentation
We use this term to refer to the general topic of standards and
codes of practice relating to meta-information, that is (in the most
mature case) bibliographic standardization such as that embodied by the
International Standard Bibliographic Description, well
known in library circles, or the Standard Study Description,
equally well known in social science. We also include under this heading
consideration of the indexing techniques used to facilitate access to
image and sound data. 4.4 Data Provision
There are few pre-existing formal standards of relevance in this
area. We expect therefore that the
Each of these is discussed in more detail below; others
may be added as the need emerges. 4.4.1 Charging and licensing
It has to be recognized that there are always costs associated with
the creation and maintenance of electronic resources. It is not
unreasonable to seek to relate the value of such resources for
subsequent users directly to these costs. However, some ambivalence
exists within the academic community on this point. There is a
widely-held view amongst information consumers that all services should
be free, just as there is an equally widely-held expectation (often
located in the same individuals) that information provision should be
remunerated in some way. A variety of mechanisms have been developed to
resolve this contradiction, of which the three most popular appear to
be: top-slicing -- where access to a resource is
centrally-funded once for all, but appears to be free to the
individual; differential charging -- where free access
to a resource by one group (academics) is effectively subsidized by the
high charges made for access to the same resource by another group
(commercial users); and
subscription -- where payment of a single fee provides
unlimited access to members only. In the network environment other
modes of charging have also been proposed, such as the notion of
``pay by use'' -- where an automatic charge is levied for each
access to a given network resource or group of resources. 4.4.2 Intellectual Property Rights
4.4.3 User Support and Consultancy
It is anticipated that the level of advisory support available for
different data sets may vary considerably. At one extreme, it may be
impossible to do more than supply documented data resources on a
``caveat emptor'' basis; others may readily provide detailed
instruction in their use, complete with tutorials, self-help systems or
casebook studies, perhaps even without actually supplying the data
resources themselves. Most will fall between these two extremes. In
line with other 4.4.4 Maintenance, Error Correction, Quality
Assessment
Electronic data sets are not necessarily static objects. A few data
sets are highly dynamic, changing from day to day, but even those which
are supposedly authoritative may need maintenance or error correction.
In the software context, users are accustomed to the notion of regular
updates -- whereby manufacturers bring out new versions in which
existing errors are removed (and replaced by new ones). Similar
considerations apply to data sets. 4.5 Data Preservation
All
Back to table of contents
On to next section
Back to previous section