Competency E – Databases

Design, query, and evaluate information retrieval systems.

“…unretrieved information is the same as nonexistent information.”

-Donald and Ana Cleveland

EXPLICATION

It is human nature to categorize and organize objects and information. Librarianship has always required such actions in order to facilitate effective retrieval of information for library patrons. Once upon a time, the information retrieval (IR) system used in libraries was the card catalog. However, in the current era of technology, more advanced IR systems are at once a quicker and more efficient, as well as more complex, mode of accessing information. Designing, querying, and evaluating IR systems are important technical skills for twenty-first century librarians.

Design

An information retrieval system is “a mechanism for obtaining, analyzing, categorizing, and disseminating information” (Cleveland and Cleveland, 2001, p.15). Effective IR system design requires consideration of users and their information needs on an individual level as well as the needs of the organization as a whole. Organization of information is critical to its effective retrieval.

Effective organization begins with assigning metadata to the items included in the IR system. A surrogate, which Cleveland and Cleveland (2001) define as “an abbreviated and orderly image of the knowledge record” (p. 2), contains attributes relevant to the information item. Attributes may be objective and describe the item itself, such as title and author, or subjective and describe content (often referred to as aboutness). Assignation of attributes depends on the judgment of an individual; thus, problems with subjective attributes can occur if the person assigning those attributes is unfamiliar with the subject area and the technical jargon inherent to that subject, or when a key concept is overlooked.

The next step in IR system design is creating field names and subject headings. Field names represent the attributes that will make up the surrogate record in the IR system. Subject headings are essentially the filing system for items. Subject headings are assigned based on a controlled vocabulary, which refers to the specific words chosen to represent concepts and the rules for their usage (Cleveland and Cleveland, 2001, p. 35). Controlled vocabularies are used to promote consistency in indexing and to decrease or remove the ambiguity inherent in natural language. Assigned terms should reflect the vocabulary inherent to the particular subject area addressed by a particular document. For example, a designer may assign the term “youth” to represent underage individuals between the ages of twelve and eighteen. In assigning this subject term, the designer also indicates a list of synonyms, such as teen, juvenile, young adult, that may possibly be used in a query by searchers. Any time a user enters one of the synonyms, the IR system will redirect the query to the subject term (or preferred term) of “youth” and return relevant items. Use of controlled vocabulary increases the precision of the IR system by supplying results that are most relevant and excluding irrelevant information.

An effective IR system will include a searchable thesaurus or list of subject terms that aids in the general understanding of a subject area by allowing users to determine the assigned preferred terms of the controlled vocabulary. The list should allow users to discern hierarchical (broader and narrower terms), equivalent (variant forms/synonyms of preferred term), associative (terms related in concept by neither hierarchical or equivalent), and homographic relationships between terms. Ambiguity may also come into play for homographs: words that are spelled the same but have different meanings, such as bear (the animal) and bear (to carry). An effectively designed IR system should delineate the difference in terms and lead users to the preferred term for each meaning. Scope notes (SN) provide users with guidelines for a term’s scope and limitations with regard to usage.

The most relevant information organized in the best possible manner means nothing if users cannot access it. Interface design is an important consideration because it must take into account a wide variety of user skills and abilities. Rose (2006) points to three principles key to interface design that take into consideration information-seeking behavior:

Search goals should match interface (e.g., basic versus advanced search),
Facilitation for selection of appropriate contexts for search, and
Support for interactive nature of search tasks (e.g., refinement and exploration).

Other considerations for effective interface design include providing shortcuts for experienced users, ease of maneuverability (e.g., reversal of actions, movement between pages), providing search history, providing suggestions for related terms along with search results, displaying and highlighting document metadata and query terms, and ability to sort results based on specific criteria (e.g., relevance, currency, etc.) (Shneiderman et al. 1997, Hearst 2009).

Query

Queries may be simple or complex; provisions for each are evident in the basic and advanced search pages offered by most databases. Users may input their own terms or, for more effective and efficient querying, use controlled language terms by locating them in the thesaurus or subject heading list. Different types of IR systems may require different types of queries or query strategies. For example, querying a search engine such as Google can be accomplished using natural or every-day language. A direct question can be entered directly into the search box and Google will more than likely deliver a large number of results. While many people think that any information need they have can be fulfilled by “Googling” it, it is important that users understand that finding relevant, appropriate information to a query on Google may be akin to finding a needle in a haystack. Search engines such as Google retrieve results based on algorithms that frequently change and which often have less to do with returning relevant results as they do with returning results based on website rankings. While databases seem daunting to many people, the truth is they are a more efficient and effective means of locating relevant, appropriate information. However, querying a database may require users to create queries using more specific terms such as key words representing major concepts or preferred terms designated in a controlled vocabulary.

Drilling down through a large number of items in a database may be more readily accomplished by employing a specific type of search tactics. Some recognized search tactics, as noted by Booth (2008), include:

Building blocks – dividing a query into facets, including variants and synonyms, and adding the concepts together using the Boolean operator AND
Citation pearl growing – beginning with a very precise search to locate one highly relevant citation, then using index and text terms to broaden the search, repeating this until all relevant terms have been identified
Successive fractions – the first facet in a query represents a major topic. Other identified facets are added to each results set using AND, with the intention of narrowing the result set to a manageable number of relevant items
Drop a concept – If a query becomes too specific and the result set falls to an unacceptable level (or zero), the least relevant facet is dropped from the query
Interactive scanning – users unfamiliar with a topic use a broad concept for the initial search and scan the large set of results to become more familiar with concepts within that topic

According to Booth (2008), berry picking, a model proposed by Marcia Bates, is perhaps the most common strategy, particularly for users unfamiliar with a topic and those working towards developing a sound research question, as “it reflects the natural interactions of the end-user whose information needs constantly change as they examine and process the results of each search set” (p. 315). Berry picking begins with a general query and evolves into more specific queries as users learn more about a topic and their interest develops in a certain direction that will eventually become the primary research question.

Indexing of controlled language terms may be pre-coordinated or post-coordinated. Pre-coordinate indexing refers to the practice of combining terms prior to searching, i.e., combination of terms is not controlled by the user. Currently, most databases use post-coordinate indexing, which allows users to combine search terms using Boolean operators such as AND, OR, and NOT. When explaining the concept of Boolean logic to students, I suggest they think of a query as a mathematic equation where the search terms are the “numbers,” Boolean operators describe the functions to be carried out, and the results list is the “answer,” i.e., what comes after the equals sign.

Ex.: censorship AND school libraries NOT academic libraries AND children OR youth = results list

Queries may also employ the use of quotation marks to enclose a search phrase, instructing the database to search for the exact term, and truncation where an asterisk is used to denote an unspecified ending to a word. For example, libr* may return results for librarians, library, librarianship, etc.

Queries can be refined and focused either pre-or post-query by use of filters (also called limiters). Most advanced-search pages offer filters concerning document type, publication information (publisher, date of publication, etc.), language, and intended audience. Post-query filters may include those already mentioned as well as those for subject and source types.

Evaluation

Evaluation of databases involves determination of effectiveness, usability, satisfaction, and cost (Rowley & Hartley, 2008). Effectiveness is measured by examining an IR system’s recall—the ability to retrieve relevant information—and precision—the ability to suppress or exclude irrelevant information. Usability evaluates the IR system interface as well as user experience. Shneiderman and Plaisant (2004) note five key components of interface usability:

Learnability – ease of use for basic tasks upon first encountering the interface,
Efficiency – ease and efficiency of task accomplishment after learning how to use interface,
Memorability – ease of use after a period of non-use,
Errors – number and severity of errors made during use as well as ease of recovery from errors, and
Satisfaction – “How pleasant or satisfying is it to use the interface?” (Shiri, 2012, pp. 243-244).

Satisfaction is difficult to measure because individual satisfaction is subjective and may depend on such factors as user familiarity and skill with querying IR systems and the perceived relevance of results by a given user. For any user, relevance of results depends on “a wide-ranging set of variables, including cognitive, psychological, educational, social, and cultural” (Cleveland and Cleveland, 2001, p. 27). Cost is also difficult to measure. Both satisfaction and cost may be evaluated based on the amount of use a database receives. Data-gathering methods for IR system evaluation include think-aloud sessions (where interviewers record users talking through their thought process), screencapturing (allowing evaluators to analyze specific points during user sessions), pre- and post-search questionnaires, and post-session interviews.

COMPETENCY DEVELOPMENT

I was introduced to online databases during my first semester in college. I was taking an expository writing class which required a fair amount of research. An affinity for databases was born and continued to grow throughout my undergraduate years, so much so that several of my instructors suggested I consider getting an MLIS degree. Needless to say, I took their advice. During my time in the MLIS program, I have taken INFO 202, Information Retrieval System Design; INFO 244, Online Searching; INFO 247, Vocabulary Design; and INFO 248, Beginning Cataloging and Classification, all of which have enhanced my understanding of classification of items for purposes of discovery as well as how IR systems work. I have not only queried and evaluated a wide number of IR systems but also constructed a thesaurus of controlled terms.

Throughout my college career, I have tutored students in the use of databases. For the past five years, I have worked with freshman English students at a local community college, instructing them in the use of databases. For most students, it is their first time using databases for research. I instruct them in how to compile a list of potential search terms, how to formulate simple and complex queries, and how to filter results using system-supplied limiters. I also show them how to choose the right database(s) based on their research topic and how to use an IR system’s thesaurus to formulate the most effective queries.

EVIDENCE

1. Design: Thesaurus Construction

The first item of evidence provided demonstrates my understanding and mastery of concepts related to the design of IR systems.

An important part of effective IR system design is constructing a thesaurus of subject terms. As noted previously, the thesaurus aids in the general understanding of a subject area by allowing users to determine the assigned preferred terms of the controlled vocabulary. For an assignment in LIBR-247, Vocabulary Design, I had to analyze fifteen subject statements provided by the instructor, extract the key concepts from those statements, compile a controlled vocabulary based on those concepts, and construct a thesaurus from the controlled-vocabulary terms. This assignment required demonstrating an understanding of facet classification, term selection, and creating an index that shows the relationship between terms. This assignment not only brought home the amount of work and close attention to detail required to produce a useful thesaurus but also the fine line separating an exhaustive thesaurus and one that is over-bloated with inappropriate or unnecessary terms.

2. Query: Web of Science Exercise; Successive Fractions Query Technique

The next two items of evidence demonstrate my understanding and mastery of concepts related to querying IR systems.

Querying an IR system not only involves choosing appropriate search terms but also how to use the features and functions provided by a database as a means of fulfilling an information need. While there are many features and functions that are common to most IR systems, such as basic and advanced search options as well as various filters for refining search results, there are some features that are unique to specific IR systems. Efficient and effective querying comes from successful employment of these features and functions. One assignment for LIBR-244, Online Searching, required demonstration and understanding of the features and functions provided by the database Web of Science in order to answer a series of information inquiries provided by the instructor. This assignment demonstrates my ability to master the use of IR systems in order to locate information that fulfills a need or answers a question.

Librarians should be familiar with different search strategies for querying IR systems, not only for their own purposes of locating information for patrons but also to show patrons with different skill levels and information needs how to efficiently and effectively locate information. A post for LIBR-244, Online Searching, required formulating a research question and employing one or more search techniques to retrieve appropriate information. I wanted to try a search strategy that I had not used before, so I used Successive Fractions. I put a slight twist on the method by choosing subject terms from both the ProQuest and ERIC thesauri to use as synonyms for my search terms. While the method proved useful in this exercise, I note at the conclusion of the post that I agree with Booth’s (2008) assertion: “more important than placing an appropriate label on a specific search tactic is judicious selection and use of such techniques to match a corresponding information need” (313). There are tactics from each of the search styles that I find beneficial and use from time to time, which I believe is probably true for all IR system users.

3. Evaluation: Poets.org; ERIC and MediaSleuth

The next two items of evidence demonstrate my understanding and mastery of concepts related to evaluation of IR systems

It is important to evaluate IR systems to determine their effectiveness and usability as well as to determine where improvements can be made. One assignment for LIBR-202, Information Retrieval, entailed evaluating an IR system based on authority, timeliness, scope, intuitiveness, functionality, and precision using a specific user model. I evaluated the IR system on the Poets.org website with two user models in mind: 1) greeting card designers (User Model A), and 2) students utilizing the database for research information for high school-level English literature classes (User Model B). Evaluation criteria were rated either positive, negative, or neutral. This was my first attempt at evaluating an IR system. The IR system provided on the Poets.org website is relatively basic; however, it is still important to evaluate it in order to determine its usefulness to a wide variety of users. One of the most important lessons gleaned from this assignment was an understanding of why consideration of user model is important in determining the effectiveness and usability of an IR system.

An assignment for LIBR-247, Vocabulary Design, necessitated evaluating two IR systems, one a freely available web-based system and the other a subscription database. Both IR systems are more sophisticated than that provided on the Poets.org website. Again, features and functions were evaluated with regard to effectiveness and usability. This assignment also required making suggestions for improvements to each of the IR systems. This assignment taught me how important it is to think about the features and functions that may provide the best possible user experience.

CONCLUSION

Information retrieval systems are almost ubiquitous: whether people are querying Google for restaurant suggestions, Amazon for a book or other product, or a subscription database for information items, they are using IR systems. It is important that librarians, as gatekeepers of information as well as those most often tasked with organizing information, have a firm understanding of concepts related to design, query, and evaluation of information systems. The discussion and evidence presented demonstrates my mastery of these concepts and my ability to put them to use. And while I feel I have mastered these concepts, I also understand that as IR systems become more sophisticated, as a librarian I will need to continue updating and refining my knowledge.

References

Booth, A. (2008). Using evidence in practice. Health Information and Libraries Journal, 25, 313-317. http://dx.doi.org/10.1111/j.1471-1842.2008.00825.x

Cleveland, D. B., & Cleveland, A. D. (2001). Introduction to Indexing and Abstracting (3rd ed.). Englewood, CO: Libraries Unlimited.

Hearst, M.A. (2009). Search user interfaces. Cambridge, UK: Cambridge University Press.

Rowley, J. & Hartley, R. (2008). Organizing knowledge: An introduction to managing access to information. Surrey, England: Ashgate Publishing Limited.

Shiri, A. (2012). Powering search: The role of thesauri in new information environments. Medford, NJ: ASIS&T.

Shneiderman, B., Boyd, D., & Croft, W. B. (1997, January). Clarifying search: A user-interface framework for text searches. DL Magazine. Retrieved from http://www.dlib.org/dlib/january97/retrieval/01shneiderman.html

Shneiderman, B., & Plaisant, C. (2004). Designing the user interface: Strategies for effective human-computer interaction, 4^th ed. Boston, MA: Addison Wesley.

Share this: