Refdbms 3 -- a distributed bibliographic database system
The 15th revision of the alpha release of Refdbms version 3 is now
available. It can be obtained by anonymous FTP from
ftp.cse.ucsc.edu, in the pub/refdbms/ directory. The system has been
tested on the following systems:
- Sun 4 systems running SunOS 4.1.x
- Decstations running Ultrix 4.1
- DEC Alphas running OSF/1
- HP-PA workstations running HP-UX 9.x
It also includes code that should make it easily ported to other
systems, but I have not tested this. Code has been contributed to
port to a few other systems, and this will be integrated into the release
soon.
The reference database project is a joint project between John
Wilkes of the Concurrent Systems Project at Hewlett-Packard
Laboratories and
Richard Golding , formerly of the Concurrent Systems Laboratory at
UC Santa Cruz (now also at HP Labs). The Refdbms version 1 system has
been in use at HPL for several years, and consists of a database of
references for approximately 3500 academic papers in computer graphics
and computer systems, with a smattering of other topics. The system
includes tools to query the database, to produce bibliographies for
LaTeX documents, and to enter new references into the database.
Version 3 extends this system to significantly improve performance and
to support distributed databases. A reference database can be
replicated at several sites, using TCP to communicate updates between
servers. It is part of ongoing research into wide-area distributed
information systems on the Internet. At last count about 30 databases
were available containing more than 21,000 references.
An
overview of the system's architecture was presented at the Winter
1994 Usenix conference.
The beginnings of the Web-based
user's manual are available.
Features of this system include:
- Distributed databases: a reference database can be shared among
multiple sites. Updates can be entered at any site, and will be
propagated to the other sites holding a replica of the database.
Updates are propagated using a weak consistency replication protocol.
- Multiple databases: every database has a name, and users specify
the order in which databases will be searched. References are named
using a compound `tag' consisting of the database name and a unique
identifier for the reference, such as `usenix:Comer84'. (A list of
the currently-available databases is available.)
- Private databases: databases can be private, available site-wide, or
they can be made available to other sites. Within one site databases
are protected using normal Unix permissions.
- Database query by keyword, author, and title word: an inverted
index of search keys is maintained for each database. The index uses
a wordstemming algorithm (derived from the Unix ispell package). The
index is incrementally updated as the database is updated.
- Translator for refer- and Bibtex-format databases: simple
translators are included that do a reasonable job of converting
existing databases in refer or Bibtex formats.
- Usable with LaTeX documents: the internal Refdbms format can be
translated into a special BibTeX format. Tools are provided to scan
LaTeX document and build a BibTeX file.
- Shell, Emacs, and X-based user interfaces.
An experimental WWW-to-refdbms
gateway has been put together by
Peter Bosch, of the Pegasus project at Universiteit Twente.
Another is used in the Cognitive
Science Library at Georgia Tech.
For more information, please send mail to Richard Golding
(golding@hpl.hp.com). Information on the basic architecture and on
the research this system contributes to can be found in a number of
technical reports available from the CIS Board at UC Santa Cruz. They
can be FTP'd from ftp.cse.ucsc.edu, in pub/tr, or you can send me
mail.
There is also a mailing list for people to discuss Refdbms. I will be
posting announcements of Refdbms bug fixes to that mailing list. You
can subscribe by sending mail to refdbms-info-request@cse.ucsc.edu.
We intend to freeze the alpha.x version as refdbms 3.1
soon when the manual is complete and the alpha.x version has seen at
least one month's production use.
We strongly suggest that all sites upgrade to the alpha.13 (or
later) version now, since it fixes several important bugs.
For version 3.2 we intend to make a few major changes to the storage
and replication mechanisms.
Differences between alpha.15 and previous versions:
- Several bugs in the refer translator have been exorcised.
- Possible improvements in the way libraries are built. (This is experimental and may be withdrawn.)
- Introduced experimental pieces of a new formatter that may be interesting to those who need to reformat refdbms output.
- Improved error messages in a few places.
- Fixed some file and directory permission bugs.
- Fixed a few portability problems, mostly compiler warning messages.
- Can now get progress indication while importing a big database.
Version alpha.14 was not released.
Differences between alpha.13 and previous versions:
- The system has been ported to a number of new systems.
- A substantial part of the manual has been written, and is
included on a draft basis. A paper on the internals is included in the
documentation.
- The installation and configuration procedure has been completely
reworked. It is now reminiscent of the X11 imake system, but not
identical.
- CPP is now almost never used for processing configuration
information. Instead, the m4 preprocessor is generally used.
- Several fixes to group membership code.
- The refstatus program has been added for the system
administrator.
- Some errors in the syntax checker have been fixed.
- I've given up on fancy file or record locking. Now the whole file is
locked upon opening, and unlocked upon close, and nothing else.
- Several bugs in the BibTeX backend have been fixed.
- The BibTeX translator is much improved.
- The interest facility has been improved.
The alpha.12 version experimentally introduces the
interest facility.
Differences between alpha.11 and previous versions:
- The refer-to-Refdbms translator has been rewritten, and now produces
a much more accurate translation. The rewritten version is considered
experimental.
Differences between alpha.10 and previous versions:
- The system should now be more tolerant of differences in cpp.
In particular, it (probably) never assumes that cpp will not put a
space after a macro expansion.
- Some improvements in file locking. Alpha.9 had a bug where many
programs would erroneously fail to get a lock (due to a typo in the
locking routine.)
- When the Log and Pending files have no records, they are truncated
to zero length. This releases disk storage after a large batch of
updates have been processed.
- Many other minor fixes.