Digital libraries must use URLs (Uniform Resource Locators) to provide access to online materials; as we move into sharing our materials, we must find ways to support Persistent Identifiers so that the same item can still be accessed from old links though the software has changed, and so items can be uniquely identified regardless of current location.
As digital libraries begin to expand to include scientific papers and materials in languages beyond standard English, there is a growing recognition of the need for Unicode support. Current software often requires Unicode for non- ASCII text, and browsers as yet only support this emerging standard with marked limitations.
All the forms of metadata described in these pages are encoded in XML, a basic method of machine encoding of text.
Each of these is briefly introduced here, with links to further resources.
Persistent Identifiers for digital objects could be used to ensure the avoidance of "broken links" for consumers via the web, and to enable management and identification of materials over time.
In 1992, the IETF (Internet Engineering Task Force) created a standard for addressing objects with the institutional committment to support them as persistent, location-independent resource identifiers. This standard is the Uniform Resource Name (URN) (namespace specifications here), and the common structure is: URN:NID:NISS, where NID is the Namespace IDentifier, and NISS is the Namespace Identifying Specific String.
In 1995, OCLC (Online Computer Library Center) introduced Persistent Uniform Resource Locator (PURL) a naming and resolution service using URLs (Uniform Resource Locators) but supporting the URN technology.
In 2001, ANSI/NISO (American National Standards Institute in conjunction with National Information Standards released Syntax for the Digital Object Identifier (DOI), which was developed in 2000 by the International DOI Foundation (IDF), a non-profit organization that coordinates applications. Unique identifiers must be registered centrally to ensure globaly uniqueness, and to provide for resolution, as an institution may fail, leaving its resources to another. An example of a registration service for managing the resolution of DOIs would be CrossRef.
The Handle System was developed in 2003 by the
Corporation for National Research
Initiatives (CNRI) ,
to provide a method for redirection to the object, but it can resolve to multiple locations or
multiple versions of the object. The structure of the handles is that of a numerical code prefix
identifying the institution, followed by a character string suffix which identifies the object:
Handle ::= Handle Naming Authority "/" Handle Local Name
The Internet Telecommunication Union put out a recommendation for the generation, registration, and use of Universally Unique Identifiers (UUIDs) in 2004, which has become an i international standard.
John Kunze of California Digital Library has proposed the Archival Resource Key (ARK) identifier for digital objects. The latest (2006) specifications are here, in an Internet-Draft proposal to the IETF (Internet Engineering Task Force).
In addition, there are efforts to piggyback the globally unique identifier resolution on that of
the Domain Name System (DNS)
that is used to resolve all URLs. The most recent is the Dynamic Delegation Discovery System
(DDDS); the Requests For Comments for the specifications are
RFC 3401,
RFC 3402,
RFC 3403,
RFC 3404, and
RFC 3405.
Unicode
was developed to provide a single encoding for each and
every character in all the major languages
written today. Before Unicode was developed, there were many different encodings in use, and
they often encoded the same thing different ways, creating chaos. In order for computers to process
text, or even display it properly, they must have a set method of rendering or interpreting each
character. There are three character encoding forms, corresponding to the number of bits
used per code unit:
The latest version is 5.0.0, and the
code
charts are available here.
These charts give the hexadecimal encoding for characters;
for example, the Unicode hexadecimal encoding for the number "3" is 0033, which you could
write into your html as 3
(Without the "x" it is not hexadecimal, and in fact is then likely to display an
exclamation point instead!) Actual machine-level encoding will not leave this hexadecimal
version viewable to users. Levels of browser support for Unicode vary; here are some
instructions on enabling Unicode
in your browser.
For a further discussion of Unicode and how to use
it, see Alan Wood's Unicode Resources.
XML
(Extensible Markup Language)
is a simple, flexible method of encoding text for machine
interoperability, while retaining human readability, and offering ease of creation. The
fourth edition of the XML 1.0
specifications
were released in August 2006. XML was derived from
SGML (Standard Generalized
Markup Language), of which
HTML (HyperText Markup Language)
is a subset. A discussion of their relationship is
available here.
An XML tutorial is here.
XML documents use a simple syntax. The first line is always the XML declaration
which defines the version and the character encoding, such as:
<?xml version="1.0" encoding="ISO-8859-1"?>
After this comes the <root> element, which contains all other elements. All of the enclosed
elements can have sub elements, as long as they are correctly nested within their parent elements.
<root>
<parent>
<child>
<subchild> . . . </subchild>
</child>
</parent>
</root>
Thus, XML is well suited to use in organizing hierarchical objects such as books:
<book>
<information>
<title>A Book on XML</title>
<author>Someone Besides Me</author>
</information>
<contents>
<chapter number="1">What's XML all about?</chapter>
<chapter number="2">Why do we need another standard?</chapter>
<chapter number="3">Simple Syntax</chapter>
</contents>
</book>
Obviously, a real book would contain much more information, and several more chapters, but this serves to illustrate the nesting of elements and the self-descriptive nature of those elements. The "/" inside the element tag serves to indicate that this is the end of the element whose name follows; so the element <chapter number="1"> begins with that tag and ends with the </chapter> tag that follows. Any elements that begin between those two tags must also end between those two tags. That is what is meant by "nesting". All attributes must be double-quoted, and all tags are case-sensitive. An online syntax-checking service is available. Using a DTD (Document Type Definition) or an XML Schema, you can specify the legal elements and attributes, as well as their usage, within your document. Validating against the DTD or schema ensures that the document meets the specified requirements. Using the same DTD or schema across institutions ensures machine interoperability, as all the documents will use the elements and attributes in the same way.
The Organization for the Advancement of Structured Information Standards (OASIS), a non-profit international organization that produces many web services standards, offers an informative website on the uses and applications of XML, as well as an "Online resource for markup language technologies" in general.
This information is provided without guarantees as to validity or completeness, particularly in light of the fact that the world of metadata in digital libraries is a world of shifting sands, constantly changing.