Friday, August 23, 2013

Comp E: Databases

Design, query and evaluate information retrieval systems.

Introduction

Databases are an extremely useful way to organize information. In LIS, we use databases and other information retrieval (IR) systems such as online search engines all the time to seek out information for ourselves and our patrons or clients. Coronel, Morris, and Rob (2011) explain, “Virtually all modern business systems rely on databases; therefore, a good understanding of how these structures are created and their proper use is vital for any information systems professional” (p. 5). Consequently, it is important that we know how a good database is designed. We should also know the various methods for querying any kind of information retrieval system. And finally, we should be able to evaluate the information we retrieve from the database, as well as the effectiveness and usability of the database itself.

  • Design


A database must be designed well to be effective for the users. The designer must take the eventual function by end-users as the highest consideration. For example, if a database at a university contains information about classes, will the users be professors, students, or staff? These groups all require different information. When the designer knows who is going to be using the database and why, he or she can begin to map out the data and relationships. This is done by organizing the data into tables, then linking the tables together. There are many models for data relationships, but the most common currently is the relational database model. In a relational model, tables of data are linked together with keys. This model can make a very large, complex database appear simple to most end-users.  

How records will be input into the database is another consideration of the designer. Depending on the needs of the database users, the designer may choose to use a controlled vocabulary for the record information. This means having a set list of words or phrases to describe the contents of a record; it is used to reduce ambiguity in the language and make searching easier. The data within each table may be indexed by certain attributes (or characteristics), which also assist in searching. When picturing a table, the attributes would be the titles of the columns. In the university database example,a table of students would have attributes such as name, birthdate, and home address. A database designer might also use pre-coordination in the database. This is when the designer thinks of the search terms a user might use to find each particular record, and uses those terms to index the records. For example, a designer might index this essay under “information retrieval systems” in a database. But more common now – because users can often search the full text of a document – is post-coordination indexing, where the searcher chooses the terms when looking for records. This leads to the next topic, the need for good searching skills.
  •  Query


Querying a database can be much more intricate than a simple, natural-language online search engine search. Although in many situations, a quick Google search is enough for our information needs, sometimes more powerful means are called for. As LIS professionals, we must know the techniques that make searching more efficient and effective for ourselves and our patrons.

In LIBR 244, Online Searching, I spent an entire semester with the concept of online database querying. We learned advanced search techniques and practiced with the “Big 3” databases: Factiva, LexisNexis, and ProQuest Dialog, as well as popular search engines. Using Boolean (or logical) search operators are fairly common in searching these databases. Putting AND, OR, and NOT between search terms can increase or decrease the number of records retrieved, depending on what results the searcher is looking for. Some databases also support truncation, wildcards, and proximity searching with terms such as NEAR or WITHIN.  Through many exercises we practiced search planning, learning how to use concept analysis, synonyms, concept blocks, and nesting to make database searching faster and resulting in better record retrieval.

 In LIBR 242, Database Management, we learned how to use SQL (Structured Query Language) in conjunction with Oracle to create databases, edit database information, and finally query databases. This is a very simple but powerful language that can manipulate the database records to form any report or view the user may want to see. SQL commands range from as simple as CREATE TABLE, to very complex and requiring many lines of text.
  •   Evaluation

Database evaluation begins as soon as the database is implemented and is a continual process to ensure the database is sustainable for the length of its life cycle. According to Coronel, et al. (2011), this is a holistic process: “Evaluation of the individual components should culminate in a variety of broader system tests to ensure that all of the components interact properly to meet the needs of the users,” (p. 387). The usability of the database can be evaluated and changes can be made to the end-user interface, or training and documentation can be added. Perhaps the users find they require more information that is not included in the database, or maybe the relationship between tables is not defined correctly. In a database management system like Oracle, records, tables, and relationships can be changed as needed without affecting the database as a whole.

When evaluating the records retrieved by a database, two measures are used: precision and recall. The precision of an information retrieval system – or the relevance of the records retrieved – is measured by dividing the number of correct results by the number of total results. Recall is the number of relevant records the IR system actually retrieved out of all relevant records in the database. They are inversely related; as precision goes up, recall goes down, and vice versa. Both are important, as the searcher may sometimes be looking for one or two very specific records on a topic (in which case, more precision would be helpful) or as much information about a topic as possible (when excellent recall would be the best option).

Evidence

My first piece of evidence is an analysis of an online database for LIBR 242 called Drewieske Database Evaluation. We were allowed to choose any large online database and analyze it by database type, ease of use, and results. I chose Petfinder.com. I found that the database was a multi-user, operational, relational, and distributed in nature, with many different people inputting information about pets from animal shelters across the country. This database is indexed by location, type of animal, age, gender, and breed. There is other information such as medical status, but that is all optional. It is an easy database to use for novice searchers, with clear navigation and error messaging. Petfinder.com does not support Boolean search techniques for advanced searchers. There seemed to be high recall on the searches I performed, but some of the information retrieved was out of date.

This evaluation of Petfinder.com shows I understand what components are necessary in the design of a database, as well as the terminology for different types and components of databases. I also demonstrate that I know how to use simple and Boolean search techniques, and also how to assess the data my search retrieves.

The second artifact I am presenting is a set of search problems from LIBR 244 (Drewieske Exercise 5). In this exercise, I research a set of five problems from a hypothetical client using Factiva, reporting the search results and explaining my process along the way. The searches I used are copied and pasted into the exercise. We did many of these exercises for this class, but I included this one particularly because the last problem asks us to compare results from Factiva to those of an online search engine, in this case Google and Yahoo. This exercise includes the best articles found in my searches, with relevant parts highlighted. In some of the problems, such as number 3 (“search for articles written by Barbara Quint”), precision is the more important measure. In other problems, like number 5 (“find articles about agriculture in Hawaii”), higher recall is needed.

With this evidence, I prove that I can successfully use diverse querying techniques like Boolean, wildcard, search sets, concept analysis, filtering, and natural language to find the information requested by a patron or client. The comparison of Factiva, Yahoo, and Google shows that I can evaluate the positive and negative aspects of searching different kinds of information retrieval systems. I can also evaluate the relevance of the records retrieved in each search, determining which search sets were successful and which were unsuccessful.

My final evidence for this competency is two parts: Group 5 Part 3 and Group 5 Part 5. These are two pieces of a semester-long group project in LIBR 242, in which we built a database from scratch, populated it with records, made it into a website, performed searches on it in SQL, and added and deleted information from it. The database we built was to be for a child’s extensive baseball card collection, where he would keep statistics on all the players whose cards he owned. There were many parts to this project, but the parts I have included best show the design and querying of our database. Part 3 shows the database schema, the data dictionary (or documentation that goes with the database), normalization of the tables, and table dependency. I contributed to writing the data dictionary and schema and made the table dependency chart. Part 5 shows various queries run on our database, along with screenshots of the resulting records. I contributed several of the SQL queries.

This was a very contentious and complicated group project. It really showed me how complex even a simple database can be. Users have many information needs, and all must be taken into consideration. No database will be perfect, but they must be optimized as much as the designers are able on the time frame and budget given. In the end, I do not think we all agreed on every aspect of this database, but it worked well for its given purpose. This evidence shows I can design a relational database according to users’ needs. The second part shows I can do complex queries of a database using SQL.

Conclusion

IR systems are the bridge connecting users with the information they need. Whether it is an online search engine like Google, or a document database through the SJSU library like ABI/Inform, I use IR systems every day. Knowing how they are designed and knowing how to properly query and evaluate the records, certainly makes me more efficient in my searches. Although I hope to make database and searching part of my career when I graduate, I do not know if I will use these skills professionally. But even if I do not, what I have learned is useful every single day.

References


Coronel, C., Morris, S., & Rob, P. (2011). Database systems: Design, implementation, and management (9th ed.). Boston, MA: Cengage Learning.

No comments:

Post a Comment