Can you see me??!?!??!

Wednesday, 20 May 2009

Library Metadata: A Potted History.

Karen Coyle helps the Open Library team navigate the twisty paths of the forest that is library metadata. A little while ago, we were lucky to have her present her view of its origins at the Internet Archive. I thoroughly enjoyed her presentation and I thought someone else might so I've rewritten my notes from the session as an exciting love-and-death-style account of librarianship over the years.

Coyle describes library metadata as "diabolically rational," with its roots way back in Babylonian times, when scholars would put (physical) tags on the end of scrolls.

Medieval books changed things. Now there were covers and pages, catalogues were often listed by author, and there were tens of thousands of books in the world. Books lived in physical spaces; "a table for the alchemy books."

Moveable type in the 1400s was the next wave, making it possible to produce lots of copies of the same book, which ended up placed on open shelves in libraries. Previously, of course, books being handwritten was a limiting factor on how easily copies could be produced. The more books and copies you have, they harder they are to find. Many aristocrats began personal collections.

As Barbara Krasner-Khait tells us in her article about the history of the library written for History Magazine, over the next three hundred years or so, libraries grew more popular as universities and state-sponsored national collections flourished across Europe. John Harvard, a clergyman from Massachusetts, donated his four hundred volume collection to the oldest library in the USA, at the university that ended up adopting his name.

In 1830, Antonio Panizzi burst on to the librarianship scene. Previously involved in trying to unite city states into what we now know as Italy, he got into a spot of bother and escaped to England, where he found a job at the British Museum as Keeper of Printed Books. There were around two hundred thousand books in the Museum's library at the time, which Panizzi couldn't hope to organize on his own, so, in 1841, created his Ninety-One Cataloguing Rules to have many hands help organize the collection. On Panizzi's entry, Wikipedians note that "these rules served as the basis for all subsequent catalogue rules of the 19th and 20th centuries, and are at the origins of the ISBD of the 21st century and of digital cataloging elements such as Dublin Core." Why ninety one rules?

A gem I plucked at random may help to give you some hint at the document's complexity:
"23 Works in more languages than one accompanied by the original to be entered in the original only unless the title be accompanied by a translation or translations in which case such translation also to be given. If no original text occur the first language used in the title to be preferred. In all cases the several languages used in the book to be indicated at the end of the title in italics."
- Cataloguing rules 1. of the British museum, 2. of the Bodleian library, 3. of the Library association By British Museum, Library Association, British Museum. Dept. of Printed Books, Bodleian Library [Courtesy of Google Books. Somewhat ironic that I couldn't find the document on OL!]

At the same time as Panizzi was making his rules, Charles Cutter and Charles Jewett were working in the USA, developing the idea of traditionally democratic access to information; the concept of a user, though apparently "Jewett was hugely concerned with uniformity, and believed that [it was] the only way to avoid confusion, no matter how difficult it made things for users."

At the Smithsonian in the early 1850s, there was directed construction of catalogues of libraries, and the exchange of printed versions of such catalogues set out to impress both other libraries and their patrons. Also, an important technological advance popped out: the card. Cards were amazing, because they can be slotted into a catalogue, into spaces between existing cards. There was also something created called a Dictionary Catalogue, where Authors, Titles and Subjects could all be mixed together. (Before, they would each have been separate catalogues).

Cutter began defining his "Rules for a Dictionary Catalog," but died before they were completed (by him). Apparently, he wrote about librarianship, with particular vision to the future, postulating what libraries might look like some hundred years on. A key assumption Cutter made on behalf of the public is that the Author, Title or Subject is known, and that the librarian may assist in choice by edition or character:

"The preparation of a catalog as it is to be manuscript or printed, and, if the latter, as it is to be merely an index to the library, giving in the shortest possible compass clues by which the public can find books, or is to attempt to furnish more information on various points, or finally is to be made with a certain regard to what might be called style."
- Page 11

In 1853, the germ of an American Library Association (ALA) was formed by Cutter. Some seventeen years later in Philadelphia, 103 librarians, 90 men and 13 women signed up as charter members of the new ALA. These included one Mr. Melville Dewey, who of course had been working away at his Public Library classification system. As it happened, Dewey also formed a company called the Library Bureau, "a major supplier of specialized library furniture, equipment, and services." It was also under Dewey's watchful eye that the standard for the Index Card was produced, its instantion handily mass-produced by Dewey's company. The Library of Congress began to print sets of index cards for sale in the late 1890s.

There's an interesting implication of card catalogues, and that's the Heading. Literally on the top of each card, in sort order, they form a huge alphabetical list. Suddenly, thanks to headings, loads of glorious difficulties sprout up all over the place. Isn't Twain, Mark the same as Clemens, Samuel Langhorne and Mark Twain? Where museums have the luxury of cataloguing unique objects, libraries must be able to catalogue copies. Two different cataloguers, two different copies, two different stories just begging to be joined.

In the 1930s, in Madras, there lived a mathematician called Shiyali Ramamrita Ranganathan who was creating the emerging field of information science, his Five Laws of Library Science published in 1931. OMG, Wikipedia is awesome! Here they are:

  1. Books are for use.
  2. Every reader his [or her] book.
  3. Every book its reader.
  4. Save the time of the User.
  5. The library is a growing organism.

Interestingly, there are at least two variants on the semantics of these laws. (And yet another written in the first person, by a student at the University of British Columbia: "Books are to be taken from locked back rooms and brought out to welcoming rooms with open shelves. Shelves need to be accessible to more than one user at a time. Libraries are to be located in the midst of their communities.")

Here come the 60s, where computing took hold in the cataloguing world. Henriette Avram, a programmer, invented MARC, the standard for MAchine-Readable Cataloguing. Innovations included using upper and lower case and varying length strings. In Coyle's words, it was a markup language, nothing to do with processing. There were no links between records, it was one-dimensional and saturated by numbering systems. These days, there are at least 13 MARC variants, and apparently, the core is outdated. One of the handy numbering records though, is pagination, where xii, 356p compared with iv, 357p can actually tell you buckets about the variation in format of two editions of the same book.

Ross Parry described in Recoding The Museum how individual expression in the daybooks of museum registrars disappeared declined shifted towards standardisation. A similar, almost brutal specificity has also been injected into librarianship, the stuff that only computing allows. There is hope, however, as Coyle noted in her talk. "You have to do some things intellectually," she said.

We hurtle from the stunning Dewey gestalt to Ranganathan's cyclical information system to MaRC, and now to the Functional Requirements for Bibliographic Records or FRBR (created 1998), where seemingly "everything is in the right place, yet totally un-findable," says Coyle. Perhaps that's because you didn't put it away.

In any standardised classification system, this quest to abstract and index all but crushes the delicate, organic relationships between stories or editions of stories, like the Seven Samurai to the Magnificent Seven, or The Travels of Marco Polo, its "million lies," and many versions.

While certainly a catalogue is simply for locating books, their myriad structures are, like us, full of eccentricities: Saints must file before popes, there are Spirit Communications21.26 (where the medium isn't always the messenger); and there's a bit for other compound surnames except those for [certain] married women22.5C4. To hope to agree upon and standardise the description of everything humanity has ever written seems insurmountably tough to me. I wonder if perhaps it could be made slightly easier if the eccentricities of all of us are welcomed, and not stamped out because they don't fit in a field in a database. From the late, great, poet-y Pope, apparently a variant on a much older work by Seneca the Younger:

Ah ne'er so dire a Thirst of Glory boast,
Nor in the Critick let the Man be lost!
Good-Nature and Good-Sense must ever join;
To err is human, to forgive divine.

If there was one main point I drew from Coyle's presentation, it's that there will always be exception nooks and opinion crannies. It's librarians who can help you see the ones other people make.

Posted at 12:06 am

Listed on Technorati.