Wednesday, November 01, 2006

Recommendations refreshed, improved

I just completed a rather extensive regeneration of the book recommendations—the "people who own X also own Y" recommendations and the "similarly tagged" ones. (The "Special Sauce" is next.) Recommendations, in turn, affect the "Pssst!" system (recommendations based on your whole library) and involved recreating all the work-to-tag clouds. Quality has, I think, risen significantly. We've improved the algorithms and the non-stop growth of the LibraryThing data set—now at 6.8 million books—continually improve the results anyway.

Scope has also improved. The system only makes recommendations for works with more than ten copies. That comes to juse 93,000 out of 1.3 million works. But these works account for over 60% of the books in LibraryThing. Before the current re-do, only 50,000 works were covered.

The recommendations are better, but hardly perfect! We've made progress toward better algorithms, and have big plans for future improvements. On the UI side, we're going to introduce users-feedback on recommendations, including both thumbs-up and thumbs-down buttons, and an "obscurity knob" for Pssst! (People love that on Last.fm.)

The system works best in the "middle," on books with 25-500 copies, non-fiction, genre fiction, books with a well-defined readership, and books that are "about" something—books like Touching the Void, Prozac Nation, The Historical Figure of Jesus and An encyclopedia of fairies. It has the most trouble with bestsellers, "obligatory books" (think high-school classics), and literary fiction—books like the Da Vinci Code and Great Expectations. To some extent, the problem is almost philosophical. What would be good recommendations for the Da Vinci Code? Statistically-speaking not much stands out. At the high-obscurity end, there are of course books on the book and its themes—Cracking Da Vinci's code or Baigen's Holy blood, holy grail.—but most Da Vinci Code readers aren't interested in that stuff. Ideally, we'd like a better mix of suggestions—some obscure stuff, and some of the quick, high-popularity reads it correlates with.

Burying the lead? We figure we're a month or so away from getting the algorithms where we want them. Once they are, we're going to start making them available to libraires, to spice up their online catalogs with top-notch readers' advisory—and for much less than they're currently paying for inferior services. We think it will make quite a splash!

NEWS: Tag-combining is back!

16 Comments:

Blogger Lithgow Public Library said...

Looking foward to the service, it will be interesting to see how it compares to our current rec system (syndetics).

11/02/2006 11:38 AM  
Blogger gabriel said...

Given that Da Vinci Code readers need something both similar and corrective, I'd say Umberto Eco's Foucault's Pendulum is probably the best choice.

11/02/2006 11:53 AM  
Blogger Weldon said...

I suppose it would be nice if you could identify tags like NewYorkTimesBestSeller or genre (mystery, etc.) and then recommend books that have similar tags. I suspect a lot of Da Vinci Code readers would be more interested in other bestsellers and other mystery books.

11/02/2006 12:23 PM  
Blogger Dystopos said...

Or, in the middle-ground between Eco and Brown, but with a similar tack, there's Wilton Barnhardt's "Gospel"

http://www.librarything.com/work/141062

11/02/2006 12:41 PM  
Anonymous Anonymous said...

Regarding Weldon's suggestion, I think it would a good idea to allow users to indicate whether a tag is "personal" (in which case it it part of the tag cloud, but can be hidden via preferences, and is ignored for stuff like recommandations) or "general" (and thus applies outside one's catalog).

11/02/2006 2:07 PM  
Blogger Tim said...

Lithgow: Shoot me an email. If you're interested, you can be one of the roll-outs (ie., free and publicized).

The similar-tags does something like you imagine, comparing the tag statistics between works. Everything is done by "salience" a calculation that weighs the frequency of a tag against it's use. So, for example, "fiction" is commonly applied to the Da Vinci Code, but not more so than to many other works. By contrast "grail" is used less, but comes out to be much more salient. At the bottom-end, tags used by only a few people drop out. This removes "personal" tags from the equation. Even if I didn't remove them, they wouldn't matter much. If someone tags the Da Vinci Code and 50 others books as "at the beach house," giving one "point" to each of those books would never be enough to push it into the top 10 or 20. There would have to be a *pattern* involved for it to matter, and if so, there's probably something "to" the tag, if you see what I mean.

So, for example, for the Da Vinci Code salince order starts with: conspiracy, thriller, grail, suspense, catholic church, mystery, religion and on...

You can see, therefore, how Baigent's _Holy blood, Holy Grail_ rises to the top, and Eco's _Foucault's Pendulum_ rises to the top (but not so much _The Name of the Rose_).

I think the tag-based recs work okay for the Da Vinci Code, actually. It's "about" something. The holdings-based ones... not so much. _The Rule of Four_ is good. But _Life of Pi_ and _Harry Potter_. Blech.

11/02/2006 2:45 PM  
Blogger Tim said...

As with all LibraryThing data, the recomendations "keep going" way past where you can see them. (We're sitting on more data than Fort Knox. Wait, that makes no sense.)

Anyway, down at 20 on the similarly-tagged is the gem

"Pirates and the Lost Templar Fleet: The Secret Naval War Between the Knights Templar and the Vatican by David Hatcher Childress"

http://www.librarything.com/work/283033

As a Catholic with a pigmentation problem, I vigorously deny this rumor!

11/02/2006 2:49 PM  
Anonymous Anonymous said...

Yay, tag combining! thank you! :D

11/02/2006 2:56 PM  
Anonymous Anonymous said...

Not sure exactly what you changed, but my "Pssst" recommendations suddenly have gotten much more interesting.

11/02/2006 10:29 PM  
Blogger Tim said...

Kencf: Yes, I should look into that. Although probably if I do I trigger some license and they own our intellectual property.

L.R. Both are good points. I'll look into them.

dchaikin: GOOD! Thanks.

11/02/2006 10:54 PM  
Blogger Passionate Dilettante said...

I'm only beginning to explore the pleasures of LibraryThing. All I need is time. Love it! Thank you.

11/03/2006 3:16 AM  
Anonymous Anonymous said...

Small suggestion: when I change from one of the Pssst! lists to "omit authors already in your catalog", would it be possible to mark the titles that are different from the first list?

Thanks for bringing "Why?" back :-)

11/03/2006 6:32 AM  
Anonymous Anonymous said...

I must have wax in my eyeballs. I just thought you wrote that "The Da Vinci Code" is literary fiction.

11/03/2006 8:15 AM  
Blogger Tim said...

Sunny: Yes. Are you finding different? Incidentally, it would take some work. I'd LIKE to track suggestions over time—"10 new suggestions today!" but it doesn't remember them on any level yet.

Karen: I did not! Further, out of the 1940 books tagged "literary fiction," not one is the Da Vinci Code, so it's not even a little true ;)

11/03/2006 6:09 PM  
Anonymous Anonymous said...

okay, this just proves why librarything or amazon or any algorithmic program will NEVER be able to do as good a job as a well-trained reader's advisory librarian. or a knowledgeable bookseller. maybe if you had thingamabrarians who tagged by appeals, you would get some decent results for the da vinci code. but tags rarely reveal why a person read or liked a book, so why should they be able to tell us what you should read next?

i just looked at the special sauce recommendations librarything had for me on the psst! page and out of 1 hundred books, there were perhaps a dozen that i had never heard of before. maybe 2 that i would actually check out in the future. i know i have an unfair advantage as a bookseller and librarian -- but come on! one of the top five is a book that i already own! (32 stories by Adrian Tomine, which for some reason has two separate work entries -- one with an author and one without). telling me the reason why you're recommending "clumsy" by jeremy brown because i own "black hole" by charles burns is totally ridiculous -- clumsy is an angst-ridden auto-bio boy comic that's drawn in a naive style and is all about relationship woes, while black hole is a dark, horror comic about teens infected with a mysterious STD that creates horrible mutations that's drawn in a meticulous, traditional style -- there's very little in common between the two besides the fact that they're graphic novels. this is not a useful model for recommending books. just admit it, let it go, and focus on the other stuff you do so well.

i'm glad to see tag-combining is back, however.

11/04/2006 3:02 AM  
Anonymous Anonymous said...

> no other books should pop up between ones that were on your first list.

Some day I'll learn to look before I post.. Thank you :-)

11/04/2006 2:00 PM  

Post a Comment

<< Home