Technology
September 28, 2009
After Losing Users in Catalogs, Libraries Find Better Search Software
Lisa Billings/Freelance
Jean A. Bauer, a graduate student in American history at the U. of Virginia, has been frustrated with the confusing search results from the university library's old online catalog. A new one is in the works.
Enlarge Photo Lisa Billings/Freelance
Jean A. Bauer, a graduate student in American history at the U. of Virginia, has been frustrated with the confusing search results from the university library's old online catalog. A new one is in the works.
By Marc Parry
Thomas Jefferson founded the University of Virginia. So you might think that typing his name into Virgo, Virginia's online library catalog, would start you off with a book about him.
Jean A. Bauer tried it the other night. At the top of the results list were papers from a physics conference in Brazil.
The problem is that traditional online library catalogs don't tend to order search results by ranked relevance, and they can befuddle users with clunky interfaces. Bauer, a graduate student specializing in early American history, once had such a hard time finding materials that she titled a bibliography "Meager Fruits of an Ongoing Fight With Virgo."
That's changing because of two technology trends. First, a growing number of universities are shelling out serious money for sophisticated software that makes exploring their collections more like the easy-to-filter experience you might find in an online Sears catalog.
Second, Virginia and several other colleges, including Villanova University and the University of Rochester, are producing free open-source programs that tackle the same problems with no licensing fees.
A key feature of this software genre is that it helps you make sense of data through "faceted" searching, common when you shop online for a new jacket or a stereo system. Say you type in "Susan B. Anthony." The new system will ask if you want books by her or about her, said Susan L. Gibbons, vice provost and dean of Rochester's River Campus Libraries. Users can also sort by media type, language, and date.
These products can also rank search results by relevance and use prompts of "Did you mean … ?"
"It's sort of our answer to, Why it is you need a library when you have Google?" said Ms. Gibbons. "What this is going to do is show how much you've been missing."
It's a pressing issue. Libraries once had a monopoly on organizing data about content. No longer. And today some users gripe about how libraries present materials online: how scattered they are, how sluggish searches can be, and how often those searches are useful only if you already know exactly what you want.
The worry for Jennifer Bowen, assistant dean of the River Campus Libraries, is that library catalogs could become "marginalized."
"There are people who just cannot find what they need," she said. "And they're just sort of giving up on libraries."
A Single Entry Point The issue concerns professors, too. One software developer pointed to a 2006 study by Ithaka, a nonprofit group that promotes the use of information technology in higher education. It found that faculty members value the campus library but "perceive themselves to be decreasingly dependent on the library for their research and teaching." The report described what appeared to be "growing ambivalence about the campus library."
The buzzwords for the technology that librarians hope will allow users to rediscover their collections are "Web-scale index searching."
That, in Ms. Gibbons's translation, is a fancy way of saying that the system, like Google, works by searching against a vast index of information. It's a contrast with an earlier attempt to deal with the search problem through "federated searching," where there is no local index, and each query is taken from the user and sent individually to various databases.
You expect a Google search to cast the broadest possible net. The same should apply to a library catalog, the thinking goes. That means a single entry point to the collection. The
entire collection: books, articles, digital objects. Heck, why not even herbarium specimens?
Marshall Breeding, director of innovative technology and research at the Vanderbilt University library, calls the concept "an ambitious goal—and at this point I think it's more of a goal than reality."
But the move toward simplified, silo-busting, relevant-result-returning library searches may come with its own problems.
Mr. Breeding, who founded the Web site Library Technology Guides, has observed "pockets of resistance" in the library community. Some argue that new search products—sometimes called next-generation catalogs or discovery interfaces—amount to a dumbing-down of catalogs.
By contrast, traditional search tools reinforce the idea that library users need a clear understanding of the different materials involved in research, Mr. Breeding said, such as the difference between articles and monographs. New interfaces that mix many different information sources blur all that, he said.
And then there are the slew of devil-in-the-details questions that arise from the content convergence.
Will users understand it? Will they find what they want? Will books be properly represented among the flood of articles? What about image collections? Could the pile of stuff just get
too big?
Libraries' online catalogs are typically one module of an integrated software system that runs library functions like the circulation desk, acquisitions, and cataloging. They are a window into what libraries manage inside their integrated systems, Mr. Breeding said, which tends to be mostly the print collections. But the problem is they lack a good way to include the growing electronic part of the library collection, he said.
What the new interfaces share is the ability to derive material from catalogs and combine it with other data in a modern package.
The commercial market for these interfaces has already produced Encore, from Innovative Interfaces, adopted by at least 44 academic libraries in the United States, according to Mr. Breeding's tally; AquaBrowser, from Media lab Solutions, used by 23 libraries; and Primo, from Ex Libris, adopted by 13 libraries.
How much institutions will have to pay for new commercial systems will vary depending on both what comes with the software and the size and complexity of the library. That could mean a price as low as $10,000 for a small academic library to one in the $100,000 range for a much larger one, Mr. Breeding said.
A 'Shift of Power' In the open-source world, at least 10 academic libraries have turned to VuFind, which originated at Villanova. Virginia's Blacklight, with Stanford University as a development partner, is in a beta phase. And Rochester's eXtensible Catalog, or XC, backed by $1.2-million from the Andrew W. Mellon Foundation, will be rolled out in the spring.
The shift from commercial products to open-source ones is about more than money, though.
Bess Sadler, chief architect of the online library environment at the University of Virginia, sees the open-source Blacklight project as a "shift of power," as she wrote recently in the journal
Library Hi Tech. The idea is that libraries, which know their local needs, should control the technology that patrons use to gain access to their collections. That's a change from the one-size-is-good-enough-for-everybody, commercially managed model that has prevailed in the industry.
The ability to customize is important when it comes to something like a music collection. A librarian might get this question: "I play the guitar. My boyfriend plays the flute. What duets can we play together?" In the past, even though Virginia had cataloged the instruments used in all of its sheet music, a search of that information was impossible because the fields that were indexed were maintained by a vendor, Ms. Sadler said.
"The problem with a vendor solution is that it's hard for vendors to tailor that solution for different collections, for different user populations, for different specializations," she said.
With an open-source system, a library can set its own relevance rankings and adjust them based on what users want. By maintaining the system itself, Virginia is now able to search by musical instrument.
The downside is libraries need someone on staff who can install and maintain the open-source program. So far, vendors aren't supporting products like VuFind the way they support established open-source products like Koha and Evergreen, both integrated library systems, said Mr. Breeding. Vendors will install software like Evergreen, host it on their own servers, and provide a help desk that you can call if something breaks. Not so for the newer software. Another barrier is going to be trusting that an open-source project is sustainable. There is always a concern that there will not be a community of users to keep developing it.
Also, the open-source systems have been slower to fold in article-level data, Mr. Breeding said. Most of that action is on the commercial side.
With Blacklight, you won't be able to get individual journal articles. If you're doing research on cell division, for example, a search will tell you that Virginia subscribes to the journal about cell division, but you'll have to go to a journal database for the article.
"That's going to be true for a very long time," Ms. Sadler said. "For the foreseeable future, you're going to need to go to separate interfaces in order to search licensed content."
But commercial vendors, smelling a new market, are stepping in. Serials Solutions, a subsidiary of ProQuest, released a software product in July called Summon. The company has been negotiating deals with publishers and content providers to create a searchable index of their content. It's like Google, except what Summon provides is an index of the "deep Web" of paid content. So now university libraries that pay for a subscription to Summon can let their users search their licensed content as well as locally owned stuff, together. Summon has 17 customers so far, including Arizona State University and Dartmouth College.
The catch? It can be expensive.
Andrew S. Nagy, senior discovery-services engineer at Serials Solutions, wouldn't say how expensive. But the cost of a subscription can run into the tens of thousands, said one university administrator who was not authorized to discuss price and thus wanted to remain anonymous. Summon also does not have permission to display the full text of articles.
At Virginia, the open-source Blacklight has paid off for Ms. Bauer.
"You know the feeling of when you go into the stacks, and you're usually looking for one book, but then it's almost always the book that's next to it that's the one you really need?" she asked. "It helps replicate a bit of that experience."
And if you search for Thomas Jefferson, it even starts you off with a book about him.
Share
Comments
paievoli - September 28, 2009 at
09:18 am
Need to find a way to self sustain these costs or they are going to become prohibitive in the future. Self sustaining models are the future of all business and academia. Read Chris Anderson's "Free".
It explains how to deal with this new economic model that will affect us all.
mitt4jp - September 28, 2009 at
03:47 pm
Report Abuse I found this article a little mis-leading. First of all, a library catalog is structured differently from a search engine. To find items about Thomas Jefferson, the correct way is to use "Thomas Jefferson" as a subject, not as keywords anywere search.
Unfortunately, instead of teaching students how to conduct a precise search with few relevant results, faculty and librarians have found an easy way out -- googlize everything.
uvalibmobile - September 28, 2009 at
06:26 pm
Report Abuse The University of Virginia is using both Blacklight and Summon in its new mobile site (lib.virginia.edu/mobile). We created a web service called "Blacksummon" which merges results from the two indices and allows faceted browsing. A third API from Ebsco allows direct downloads of some PDFs.
bsparris - September 29, 2009 at
09:07 am
Report Abuse The problem is people are trying to use the catalog the wrong way. Instead of a keyword search like on the internet and online databases, the catalog offers something unique-- direct access to exactly what you want through a browse or exact search using subject headings, authors, titles. An old idea but it still works--give it a try!
bsparris - September 29, 2009 at
09:07 am
Report Abuse The problem is people are trying to use the catalog the wrong way. Instead of a keyword search like on the internet and online databases, the catalog offers something unique-- direct access to exactly what you want through a browse or exact search using subject headings, authors, titles. An old idea but it still works--give it a try!
bsparris - September 29, 2009 at
09:07 am
Report Abuse The problem is people are trying to use the catalog the wrong way. Instead of a keyword search like on the internet and online databases, the catalog offers something unique-- direct access to exactly what you want through a browse or exact search using subject headings, authors, titles. An old idea but it still works--give it a try!
pucciot - September 29, 2009 at
09:58 am
Report Abuse The Library was once considered to be the center of the University. It is now treated the same as the food court in the student center. It seems that the University Libraries (and Librarians) are not being rightly considered as an important part of the educational process. Teaching students what to search, how to search, and how to choose good resources is an important part of the the University education. Today it seems that just because our students come in knowing how to perform a google search that that is all they need. Library databases are "tools". Knowing how to use a tool properly must be taught. To apply a simple metaphor would be to think that just because a student took _Shop_ in High School that they should be able to be brought into a factory to build a car.
The University Library and the use of its resources should be considered part of the University Education. Web level discovery layers are new useful tools - but they do nothing to educate a student to be more information literate.
ladykaty - September 29, 2009 at
10:50 am
Report Abuse If the graduate students don't know the difference between a keyword and a subject search, I think, perhaps, that the university would do better to invest in a comprehensive information literacy instruction program rather than expensive "improvements" to the catalog.
commentarius - September 29, 2009 at
03:50 pm
Report Abuse Much as I am also irritated by users who don't know a keyword from a hole in the ground, the tendency to blame the user for not knowing how to use a catalog is exactly the kind of thinking that got us into this mess to start with. Yes, users are idiots. But good systems are designed for idiots and help idiots be successful despite their idiocy. That's why Google is so popular, and why catalogs are not. Any tool that requires "instruction" to use is doomed.
11134078 - September 29, 2009 at
04:22 pm
Report Abuse There is a serious difficulty in all this. Faceted cataloging is inadequate. We have to start from this realization. Good old LC subject headings are still (SHOULD still) be the way to go. Learning to use them takes a few hours, but it is really not a big deal. (I taught this stuff until just a few years ago.) Once the concepts of the free-floating headings and the authority files are understood and there is also a basic knowledge of the material that used to be in the introductory section of the "big red books" and now should pop up online when needed, the system is at its base quite simple (despite its occasional bouts of illogic) and very effective. By the way, the current OCLC search engine is an unusable abomination.
11134078 - September 29, 2009 at
04:24 pm
Report Abuse There is a serious difficulty in all this. Faceted cataloging is inadequate. We have to start from this realization. Good old LC subject headings are still (SHOULD still) be the way to go. Learning to use them takes a few hours, but it is really not a big deal. (I taught this stuff until just a few years ago.) Once the concepts of the free-floating headings and the authority files are understood and there is also a basic knowledge of the material that used to be in the introductory section of the "big red books" and now should pop up online when needed, the system is at its base quite simple (despite its occasional bouts of illogic) and very effective. By the way, the current OCLC search engine is an unusable abomination.
11134078 - September 29, 2009 at
04:24 pm
Report Abuse There is a serious difficulty in all this. Faceted cataloging is inadequate. We have to start from this realization. Good old LC subject headings are still (SHOULD still) be the way to go. Learning to use them takes a few hours, but it is really not a big deal. (I taught this stuff until just a few years ago.) Once the concepts of the free-floating headings and the authority files are understood and there is also a basic knowledge of the material that used to be in the introductory section of the "big red books" and now should pop up online when needed, the system is at its base quite simple (despite its occasional bouts of illogic) and very effective. By the way, the current OCLC search engine is an unusable abomination.
rattebur - September 29, 2009 at
05:01 pm
Report Abuse Commenters who claim that students need to be taught the correct way to use existing catalogs need to come up with a comprehensive way to teach every student at a university this information. Librarians don't often have access to a wide swath of students for instructional purposes; at many institutions, they are dependent on teaching faculty and instructors to want to integrate library instruction. More user-friendly catalogs seem much more realistic at this point.
rattebur - September 29, 2009 at
05:03 pm
Report Abuse Commenters who claim that students need to be taught the correct way to use existing catalogs need to come up with a comprehensive way to teach every student at a university this information. Librarians don't often have access to a wide swath of students for instructional purposes; at many institutions, they are dependent on teaching faculty and instructors to want to integrate library instruction. More user-friendly catalogs seem much more realistic at this point.
11134078 - September 29, 2009 at
05:52 pm
Report Abuse rattebur, my friend, there are lots of things students need to be taught. Many of them are now subjected to freshman seminars, how to study sessions, long harangues to the effect that credit card companies really do send bills and really do charge extortionate rates of interest if those bills are not paid promptly. Come on now, how about a session on how to use subject headings? And "user friendly catalogs" are in fact hostile to users who actually know how to use catalogs because they are so damnably primitive and therefore yield so many irrelevant hits (or, alternatively) none at all.
jhough1 - September 30, 2009 at
08:05 am
Report Abuse I teach at Duke and live in Washington D. C. The LC catalog is wonderful. You can make a mistake in spelling, type in half a name, you name it, and you get something. Duke, I assume, has bought something, and you must have a perfectly spelled name, usually with first name and maybe the middle initial to get a reasonable response even on the author catalog. I just use LC and check the Duke stacks. Unfortunately, older books are off campus. Is it not possible to use LC technology?
erla32 - September 30, 2009 at
08:24 am
Report Abuse Duke uses an open-source solution developed by the NC State libraries and used to search all of the Triangle Research Network institutions (Duke, NCSU, UNC-CH, NCCU). Library of Congress has a purchased system -- Ex Libris.
zizzer - September 30, 2009 at
09:37 am
Report Abuse I guess I have finally reached the tipping point of the generational divide, maybe it's just my learning style, but I don't like getting a muddle of everything and the kitchen sink from search tools. I like knowing what media the tool I am searching indexes and where it will ultimately lead me.
Short of that I would want clear delineations in any results, and I see that frequently from students who didn't grow up digital. They don't want an eBook, they want a "real" book they can check out and take home. (We serve a rural area with spotty Internet access.) They don't want a citation, they want full text - right NOW - that they can print or save to a flash drive for later. We have a federated search tool to a set of consortium resources and many find it very confusing and it often yields inferior results because the searches have to be dumbed down to adapt to each individual database. In short, it stinks, and users often don't understand the results and miss great information. The smart ones ask for help, which gives me concern about the rest.
I would that we had more time to teach Information Literacy. When I was in elementary school our library visits had three components: Time that we learned about the library, story time, and time to find books to check out. In my freshman year of college I had to take a half-semester course called Bibliography where we learned to use the library and its resources. As it is now, we are lucky to get 50 minutes with the students who take Study Skills, but not all students are required to take it, and many consider the library day a day to blow off.
blackbart - September 30, 2009 at
09:38 am
Report Abuse I _think_ the issue that this article is trying to probe is the dichotomy between binary searching and search engines. Most well-established library catalogs use binary searching--you type in a term, and the catalog returns only those records that contain the term you typed (in whatever fields you did or didn't specify, depending on the search and the catalog interface). The results are binary: either the record matches the search string and is retrieved, or it doesn't and isn't. Search engines like Google, by contrast, use complex algorithms to interpret the search string in an effort to show you what the software "thinks" you wanted based on that search string.
It takes all of five minutes to explain that difference to students. It might take as long as an hour to drill the difference into them by demonstrating identical searches on binary and search-engine interfaces. Each has tremendous strengths; each has weaknesses relative to the other model. But do we really need to spend a gajillion dollars in software development and retrain the entire university community just because students were using Google before they got to campus?
greebie - September 30, 2009 at
09:44 am
Report Abuse Library instruction is limited. To remember what special ritual dance you need to do in your specific discipline, you need to actually practice it. That means dancing with each and every student for quite a long time. Personally, I'd rather put the teaching resources into critical thinking skills, source evaluation, finding learning networks (the best way to get the 'classic' tomes of a field is still knowing a prof and then tracing the scholarly pedigree via the bibliography).
Open source models look promising and hold the best option for sustainability over time. These products are very expensive for what they do - they shouldn't have to be.
Add Your Comment
You must be logged in to add a comment. Please
login now or
create a free account.