Monday, September 9, 2013

Comp G: Cataloging

Demonstrate understanding of basic principles and standards involved in organizing information, including classification, cataloging, metadata, or other systems.

This competency delves deeper into the short section on organization from competency F. In order for users to find information in a large collection of items, they must be organized and entered in a systematic, standardized way. To make it easy for the user to find an item, though, takes quite a bit of behind-the-scenes work by LIS professionals who organize, classify, and catalog physical and digital materials. Each item in a collection is given a surrogate, or a record, that lists information about it (for a book, this would be the title, author, year published, etc.). These records are then kept in one place – in the “old days”, the card catalog, but now is most likely a computerized database. There, a user can access the records through access points, which would point the way to the actual item. For example, in a public library, a user can look up the title of a book on the OPAC (Open Public Access Catalog) as an access point, which would give the call number for where the user could find the item on the shelves. If the users were searching for a digital item, there may be a link to the full item right in the record.

·         Subject analysis
Cataloging is not just a matter of entering in the information on the title page of the book. Each work must be indexed and classified under a subject heading. A cataloger must determine the “aboutness” of an item, and it is highly subjective work. In formal systems such as the Library of Congress Subject Headings (LCSH), a controlled vocabulary lessens some of the ambiguity in the language. This controlled vocabulary contains precisely defined terms for the cataloger to choose from; each concept is described by one term. All the terms in a controlled vocabulary, as well as their relationships to similar ideas and terms, are listed in a thesaurus. Pre-coordination and post-coordination are important concepts in subject analysis. Pre-coordination is combining subjects into one sequence by a cataloger (i.e., Law-Political aspects-United States-History-20thCentury). Post-coordination is combining keywords at the time of a search by a user. Both have their place: pre-coordinate terms are valuable for browsing, while post-coordinate terms are more flexible.

Another type of organizing and cataloguing digital items online by subject is tagging. This is usually a very informal method wherein the users create subject headings for objects based on their own impressions. This method can make natural language searching easier. Maness (2006) uses the example of the LCSH term “cookery” for “cookbook”. Cookery is a word which virtually no one uses anymore; users could tag the items “cooking” or “cookbooks” instead, making these items much easier for a patron to find. Because of this benefit, some libraries have added a tagging feature to their OPACs. This act of collaboratively tagging items is also called folksonomy, a portmanteau of “folk” and “taxonomy.”

·         Classification
Most libraries have their collections arranged in a hierarchical classification structure. The subjects, or classes, are arranged in a tree-like structure, with more specific sub-classes branching off each one. The Dewey Decimal System (DDC) and the Library of Congress Classification (LCC) are the most common hierarchical classification schemes. In DDC, all knowledge is divided into ten categories, each with ten subclasses, and each of those with another ten sub-subclasses. This results in 1,000 categories, with a three-digit number representing each. It can become even more specific with additional numbers following a decimal point. These extra numbers can bring out particular parts of a topic, like the time period the work covers or the form of the document. The LCC divides knowledge into categories by the letters of the alphabet. The notation is one or two letters followed by one to four numbers. Other codes may be added after a decimal point. It differs from DDC in that each class has its own structure and is not consistent with other classes. DDC notation reflects the hierarchical structure, whereas LCC classes are basically numbered in order.

Records are generally created for library use using MARC (Machine-Readable Cataloging) format. Each MARC record has three-digit codes signifying fields for information such as author, title, date, and language. To determine what to record and how to do it, catalogers used AACR2 (Anglo-American Cataloging Rules, 2ndedition) until about 2010. Since then, RDA (Resource Description and Access) has superseded AACR2 as the cataloging standard in most English language libraries.

·         Metadata
Digital items have associated metadata. Metadata is structured information that describes a resource. Many people use the term only for electronic resources, but old-fashioned card catalogs were a type of metadata, also. In fact, metadata serves many of the same functions as a card catalog did, such as identifying resources by criteria, aggregating or distinguishing resources, and giving resource locations. But metadata goes beyond just the basic bibliographic information; it can be used for technical, structural, and preservation information, and rights management. Most metadata standards have interoperability, so they can be used across platforms.

I took LIBR 281 – Seminar in Contemporary Issues: Metadata just so I could learn more about this important and wide-ranging topic. We studied some of the most common metadata schemes used by LIS professionals. Dublin Core is a widely-used scheme based on MARC that was originally intended to describe web documents, but has been expanded to include many kinds of items. For example, the EPUB e-book format uses Dublin Core. It is made up of 15 optional repeatable core elements (i.e., Creator, Publisher, Format, and Language), which makes it very flexible. Dublin Core can be encoded in XML (Extensible Markup Language) or RDF (Resource Description Framework), two very common languages. In LIBR 281, we also learned about more specialized metadata schemes. There is a scheme for every type of item that needs to be organized. Here are just a few examples:
o   TEI (Text Encoding Initiative) is used for marking up texts.
o   METS (Metadata Encoding and Transmission Standard) is used for digital library objects.
o   MODS (Metadata Object Description Schema) is an XML-based bibliographic scheme developed by the Library of Congress.
o   EAD (Encoded Archival Description) is for archival finding aids.

Evidence 1 is an exercise called Metadata Crosswalk from my metadata seminar, LIBR 281. In this assignment, we were to make a record in one metadata scheme and transfer the information to another, a process known as crosswalking. I wrote the record first in plain text, then in Dublin Core, and crosswalked it to MODS. In doing so, I was able to practice entering records in both schemes, learning about the variations in terminology and how each is encoded differently. Learning the vocabulary in each was the most difficult part of the assignment; the documentation for each scheme is not always very clear or straightforward when defining terms. But after doing several metadata exercises which culminated in this crosswalk assignment, I was beginning to see the patterns in the language of the schemes. For example, both Dublin Core and MODS use the term “extent” for the file size of a digital book, while using the “format” element for the dimensions of a physical book. This evidence proves I understand how to compare different metadata schemes and how they can be used for cataloging digital items.

Evidence 2 is a research paper called Metadata Scheme Darwin Core, also from LIBR 281, about the metadata scheme Darwin Core. We were to choose on scheme and research its history, documentation, terminology, and uses. I chose Darwin Core because I was very interested in how metadata was being used to catalog biological materials, rather than bibliographic items. This scheme is used by many natural history museums and biological research institutes. It can be encoded with RDF, XML, JavaScript, and more, making it interoperable and fairly simple to use. Through this research, I learned how one metadata scheme can be applied in many different ways to organize collections of information. Darwin Core is being used to make interactive maps of local wildlife, to digitize old museum specimen labels, and to assist marine biologists who are unfamiliar with data management to enter and maintain their specimen data. With this research paper, I show that I understand the importance of metadata to organization of information and how it can be so useful even outside of the information science field.

Evidence 3 consists of two pieces and comes from LIBR 202. It is Assignment 3 Datastructure Entries and Assignment 3 User Guide. This was part of a project in which I had to take 15 of the journal articles from the semester’s reading list and index them in an InMagic database. Part of the assignment was learning about subject analysis, so I used controlled vocabulary where necessary, as well as coming up with pre- and post-coordinate terms for each article. This assignment really required me to think critically about classification and subject analysis. I had to really consider each term I used, to make sure it was the correct one to apply in this situation. For instance, should I use “folk taxonomy” or “folksonomy”? To begin, I came up with some broad topics first, then broke each topic down into sub-classes until I felt I had captured the “aboutness” of each article. Post-coordinate terms were easier to come up with; each article had one or more keywords that seemed to fit the subject matter. The assignments show these pre- and post-coordinate terms, as well as other rules I had for entering records in my database. It proves I understand the principles of subject analysis, as well as the challenges of using pre-coordination to make the articles accessible to anyone who may search for them.

Knowledge of metadata, as well as other information organization principles, will be incredibly helpful to me in my career. It is one of my favorite topics, as well as being important to the future of information science as more and more data is available online. When data is properly organized and classified, users have a much easier time finding what they are looking for. I have taken the information gained through these classes and applied it at home already; I catalogued my entire home library by LCC standards and I have “cleaned up” all the metadata on my digital music collection. My hope is to be able to employ these skills and knowledge beyond my home, in a library setting.

Maness, J. M. (2006). Library 2.0 theory: Web 2.0 and its implications for libraries. Webology 3(2). Retrieved from

No comments:

Post a Comment