Sunday, November 12, 2006

BookSuggester and UnSuggester

People do not generally like BOTH Shopaholic and Critique of Pure Reason.
The "real" news today is the debut of BookSuggester, a new feature designed to expose LibraryThing's excellent and varied recommendations to members and non-members alike. We put them alongside Amazon's, which are also quite good. We are proud of our recommendations, but haven't perfected the perfect algorithm yet. When we've made things as good as we can, we're going to start offering recommended book data to libraries.**

But to heck with that! Let's talk about bad recommendations. Today we introduce UnSuggester, "the worst recommendation system ever devised™."

UnSuggester is a brand new idea in recommender technology. Recommender systems usually work by similarities. Amazon's "Customers who bought this item also bought" and LibraryThing's "People with this book also have" are typical of the type—What books do people buy together? What books occur often in the same member libraries?

UnSuggester flips this logic: What books DON'T occur in the same libraries? We took our "similars" algorithm and changed "sort ascending" to "sort descending" and—hey presto!—instead of similar books, we get opposite ones. You bet we're going to patent it!

How does it work?

UnSuggester starts by finding every copy of the book in question and all of its owners. So, taking Thucydides' History of the Peloponnesian War as an example, LibraryThing finds the 600-odd people who have entered this ancient classic in their account. Then it makes a big pile of all their other books, a pile of some 623,000 books in all. Then it does a little math. If LibraryThing has seven million books, then a pool of 623,000 book is about 8% of the total. If this pool were average, it would also contain 8% of the Harry Potters, 8% of the Derridas and 8% of the Danielle Steels. But this isn't so. People who own Thucydides aren't a random cross-section of the book-loving public. For example that 8% also contains almost half the Caesar and Plutarch in LibraryThing. At the other end of the scale, Thucydides-fanciers are particularly immune to the novels of Marian Keyes and Dean Koontz. The greatest disconnect occurs with Louise Rennison's popular, teeny-bopper chick-lit novel Angus, thongs and full-frontal snogging : confessions of Georgia Nicolson—the top UnSuggestion.

What patterns emerge?


The Mists of Avalon and Desiring God are very uncommon shelf-mates.
Play with it a few minutes, and patterns emerge. Philosophy and postmodern literary criticism oppose chick lit, popular thrillers and the young adult section. Programming does not truck with classic literature. Memoirs of depression, like Prozac Nation, meet their match in the cheery The Night Before Christmas. Ann Coulter and David Sedaris do not see eye-to-eye. There is a strong disconnect between readers of much recent Protestant, mostly evangelical, non-fiction, and large swaths of contemporary literary fiction. For example, LibraryThing includes 2,300 readers who've logged Jeffrey Eugenides' epic gender-bender novel Middlesex, and 222 readers of John Piper's The Passion of the Christ: 50 Reasons He Came to Die. But the groups don't overlap. No reader has both. Similar instances occur again and again.

These disconnects sadden me. Of course readers have tastes, and nearly everyone has books they'd never read. But, as serious readers, books make our world. A shared book is a sort of shared space between two people. As far as I'm concerned, the more of these the better.

So, in the spirit of unity and understanding, why not enter your favorite book, then read its opposite?

By the way, how about putting this up on Del.icio.us or Digg? Wait, we have a Digg now.*

*LibraryThing never been Dugg. But recently a pale immitation of LibraryThing was lofted to the heavens as the first social network for books. For one day Digg gave them twice our traffic. Fortunately, they fell like a stone after that and, this morning, Alexa has them at "too low to measure."
**My friend Ben correctly points out to me that he suggested "find your book nemesis" almost a year ago.

45 Comments:

Anonymous Anonymous said...

Brilliant!!!!
I was looking forward to being suggested this way!!

Thanks for opening our minds and eyes!!!

Zuzka from Paris

11/13/2006 3:05 AM  
Anonymous Anonymous said...

In playing a little, it appears I own a couple of my top unsuggestions, uncatalogued. Maybe I'll buy one or two, as well, and up that unlikely overlap just a little more. Thanks for enriching the suggestions, displaying them in a coherent way, and again offering us the unexpected. It's terrific.

11/13/2006 3:29 AM  
Anonymous Anonymous said...

That's quite a gauntlet you have thrown down there Tim

11/13/2006 4:18 AM  
Anonymous Anonymous said...

I 'unsuggested' one of my favorite books of the last 50 years, Hubert Selby Jr.'s Last Exit to Brooklyn, and the list generated was loaded with Terry Pratchett and Laurell K. Hamilton, and I owned, nor had read, not a single one of the 75 titles that were 'unsuggested'. With all due respect for the the process and the effort involved, So what?

Then I unsuggested one of my favorite books of the last 2 years, Luis Alberto Urrea's The Hummingbird's Daughter, and got a list dominated (again) by Terry Pratchett and S. King, and I owned not a single title in this list either. Of course there are huge disconnects among readers, always have been, ever will be, and I'll gladly buy Terry Pratchett's and Laurell K. Hamilton's books on that day when the skies of Tennessee are filled with swine, all spreading their glorious wings.

11/13/2006 5:11 AM  
Anonymous Anonymous said...

I'd say this calls for a new game:
Stump the Unsuggester

Can you get it to suggest a mismatch for a book you own that you either own (but haven't catalogued) or want to own?

Will go start a topic in the Challenges Group.

Done:
http://www.librarything.com/talktopic.php?newpost=1&topic=3890#lastmsg
Come and play!

And I tried it on my favorite book, Left Hand of Darkness and got 74 books about Christianity. The only one I might have been interested in reading was one on koine grammar.

Aquila

11/13/2006 5:42 AM  
Anonymous Anonymous said...

Bwahaha!

One of the unsuggestions for The Hobbit by J.R.R. Tolkien is The Hobbit by ________

I guess people really aren't likely to own both alright, though apparently 12 people do. Looking closer two of the copies are 'The Hobbit, (Animated', so I guess movie copies are being thrown in there if people put nothing in the author slot, and they would be owned by people who own the book.

11/13/2006 6:16 AM  
Blogger Tim said...

Aquilla: Yeah, I think that might be screwey programming. The results are pre-generated, and to do it quickly, I essentially froze the works combination while I ran the script. This openened the door to problems if a book was combined or uncombined during that period. Maybe that's what happened here...

11/13/2006 6:48 AM  
Anonymous Anonymous said...

This is a lot of fun... and it's leading me to find gaps in my catalogue, as I think 'ooh, I'm sure I have a copy of that...'

11/13/2006 7:21 AM  
Blogger V Smoothe said...

Funny! I actually own both Shopaholic and Critique of Pure Reason, though I believe I've cataloged neither.

So I took the unsuggester and put in my favorite book, Le Petit Prince. I am now going to read "Becoming conversant with the emerging church : understanding a movement and its implications," just because. I think this is a really neat idea.

11/13/2006 11:07 AM  
Blogger Tim said...

Thinking about the Hobbit thing, I'm sure that was the problem. The work was combined while it was running, so it expected a lot of copies of the Hobbit and there were literally zero under the work code it was expecting. It's fortunate Unsuggester isn't self-aware, or it might have had a nervous breakdown.

11/13/2006 11:13 AM  
Anonymous Anonymous said...

What a great surprise! I just discovered the UnSuggester after purchasing Perez-Reverte's The Flanders Panel. I checked the unsuggestions, and realized that I have 7 of the top 10 unsuggestions.

As probably one of the few members with a fair amount of litfic and conservative/evangelical Christian theology, the UnSuggester is actually giving me some great suggestions. I've been meaning to add some Gaiman for awhile. What a great reminder. Thanks!

11/13/2006 1:00 PM  
Blogger Tim said...

BookPress: well try the straight-foward BookSuggester too (http://www.librarything.com/suggester). Unless you're your own opposite, UnSuggester can't possibly be better! :)

11/13/2006 1:08 PM  
Blogger Michael said...

Of course, now that people are reading their unsuggestions, this changes what the next people see as their unsuggestions. An interesting evolution of the process. It would be interesting to cache the unsuggestions for a certain set of books and see how it changes over time, due in large part, probably, to the unsuggester itself.

11/13/2006 1:09 PM  
Blogger Tim said...

An interesting theory, but I don't think they'll be a lot of UnSuggester reading. If Match.com had a "disaster date" button, would their system lose its value as the wrong sort of people started dating?

11/13/2006 1:27 PM  
Anonymous Anonymous said...

Tim - digg is horrid. It shows the quality of LibraryThing that is not been d(r)ugged (opiate of the massed).

11/13/2006 1:45 PM  
Anonymous Anonymous said...

An interesting theory, but I don't think they'll be a lot of UnSuggester reading.

You underestimate Thingamabrarians. Some of us are just looking for new ways to branch out from our old ruts... and something flagged as 'opposite' to our normal reading might just be what we're all looking for. (Besides, a lot of the 'niche' books are throwing up classics in the unowned lists, and many people like to improve their lit-cred.)

11/13/2006 1:58 PM  
Blogger JLH said...

Wow, there's a lot here to play with! But for now I just want to add a comment about Amazon's suggestions. It got a bit inane when they told me last year that "Users who have enjoyed {Ttile xyz} also purchased {Brand W} socks." Huh??? Socks??! What does that have to do with the price of eggs?

11/13/2006 3:02 PM  
Blogger Tim said...

Well, what if the book were "Scotland by Foot in the Spring" and the socks were really thick?

I've never seen it go that far. DVDs, I've seen. Socks? Crazy.

Fortunately, we screen out non-ISBN stuff. This is partly because we don't want sock and partly because we don't want non-ISBN identifiers polluting our database; once a company's proprietary ids get in, you're stuck with them forever...

Did you know you can buy bananas on Amazon? According to the page, people who buy bananas also buy tomatoes, grapes and lettuce. What, no Corn Flakes?

http://www.amazon.com/Chiquita-Bananas-4-5-lbs/dp/B000328OH6/sr=8-2/qid=1163448041/ref=pd_bbs_2/002-6429519-7855230?ie=UTF8&s=gourmet-food

11/13/2006 3:09 PM  
Anonymous Anonymous said...

I would be interested to see the result of unsuggestions restricted to works that share at least one tag with the searched for book. Of course, you would have to exclude overly generic tags like fiction, non-fiction etc.

Currently Hayek's The Road to Serfdom and Marx's Capital both result in a lot of light fiction as unsuggestions. This isn't really too interesting. It would be more interesting to see what kind of political/economic/philosophy books people with a particular view are not reading.

11/13/2006 3:09 PM  
Blogger Tim said...

Eric: I thought about something like that. But that's a nice way to think about it. I'll give it some thought.

11/13/2006 3:10 PM  
Blogger Tim said...

At the very least, it suggests that the similarly-tagged list should be sorted by similarly-owned, right?

11/13/2006 3:11 PM  
Blogger Linda said...

I applaud your efforts to encourage readers to expand their genres. Well done!

11/13/2006 5:45 PM  
Anonymous Anonymous said...

In response to Tim's last comment,

Blending shared tags and shared works might be interesting, but I've been on the whole impressed by the relevance of the similarly tagged recommendations. Its also interesting how often the similarly owned books are narrowly refined by subject/genre without any assistance of tags.

My point was that in looking at the unsuggested results the big divide seems to be between heavy and light reading. This is not surprising, and so its also not very interesting. You might get far better results with a little bit of sorting by tags.

11/13/2006 8:35 PM  
Anonymous Anonymous said...

In response to Eric and Tim: one (easy?) step would be to have the possibility to split the unsuggestions into fiction / non-fiction, wouldn't it?

11/14/2006 4:20 AM  
Blogger JLH said...

Isn't it funny how perverse we Thingamabrarians are, that we have so much fun with the Unsuggester! The Suggester, is of course good too, but we like to play. Otherwise, why would we be cataloging our own books? Oops -- I'm at work and should be working on my school catalogue.... Anyway, the reason for this message is to say that a Powells Books bookmark has a picture of Salman Rushdie with a pertinent quote: "If I like 'The Simpsons' and I like 'The Iliad,' why shouldn't we talk about them in the same sentence?" -- from the Powells.com interview with S.R. Obviously, he'd be all over the Unsuggester were he on LT!

11/14/2006 8:59 AM  
Blogger mkhall said...

I'm afraid I don't have much to add, other than to point out that "The Miscegenated Library" would be a great title for a blog. It also gave me a laugh.

11/14/2006 10:10 AM  
Blogger Tim said...

Particularly when both fall under the tag "Homer"!

11/14/2006 10:50 AM  
Anonymous Anonymous said...

How about some group based unsuggestions. There would have to be limits on the minimum size of group and/or their collected library.

What don't Librarians read? etc.

11/14/2006 1:49 PM  
Anonymous Anonymous said...

The fact that people don't own books that aren't to their personal taste doesn't mean they haven't read them! Most people have a limited budget to spend on books. They buy only those they like, or expect to like, well enough to reread. They usually read many other books they get from the library.

Also, if they do happen to buy books they find they don't especially care for, they often give them away or resell them -- for lack of shelf space if for no other reason, or simply to share what they no longer need (e.g. via Book Crossings).

So the fact that people don't OWN books covering a wide variety of tastes really isn't anything to be concerned about. To judge awareness of opposing views it would be necessary to look at reading habits, not just ownership.

11/14/2006 7:56 PM  
Anonymous Anonymous said...

This comment has been removed by a blog administrator.

11/14/2006 9:56 PM  
Blogger Tim said...

Sylvia: No, I understand. First, LibraryThing is not just for things you own. Many people use it for anything they read—owned, borrowed, lost, etc.

That said, there's also no question that LibraryThing does not represent all booklovers and everything they've read. Indeed, I'm sure the statistics underplay thing. People not only don't own all the books they've read, many also enter just their favorite books, not necessarily all their books.

And there are a million reasons the data might fall short. I'm quite sure *some* of the recommendations are just statistical flukes. When two books have 1,000 owners each, but not one overlapping owner, it usually means something, but it might just be luck.

But does it matter that people give away books they didn't like, or catalog their favorites? Does it change the data? It seems to me that you can still understand reading habits and prefrences from the data. The raw number of people who've listed X might change, but the "order" of the listings would not.

There's no question the method contains "noise," but to invalidate it there needs to be strong potential for systemic, content-changing bias. I'm not sure I see that.

11/15/2006 12:26 AM  
Anonymous Anonymous said...

Tim,

What is the difference between v.1 and v.2 of "people with this book also have"?

11/15/2006 9:24 AM  
Blogger Tim said...

Basically v2 has the "obscurity knob" turned up. It care more about the ratio of have/expected than the absolute number of have/expected. v1 is also massaged a bit to dampen high-popularity low-specificity books (eg., things you read in High School, like the Crucible).

11/15/2006 2:43 PM  
Anonymous Anonymous said...

I love UnSuggester, but I have one question... why does it list 75 books, while the good suggester lists only 15? I'd really like to have a longer list of recommended books, especially since I can't currently exclude the author from recommendations.

11/15/2006 3:40 PM  
Blogger Philomath said...

This is a wonderful feature on a great site. I spent quite a while playing with it and it generates some fascinating results. One thing I did notice was certain works are far less polarizing than others and that this does not always track with how common it is. Have you considered exposing some stats on how various items compare to one another. I would be interested to know which works are most seperated from others and which are least so. Maybe you could add a new zeitgeist listing for the items which, across the entire corpus of the site, have the biggest difference between anticipated overlap and actual overlap.

- aka terryzman

11/15/2006 7:53 PM  
Blogger Tim said...

_zoe_. Well, I wish I had better vocabular for and a deeper understanding of statistics--LibraryThingers who do, ar invited to write in--but it goes something like this. For simplicity's sake, imagine LibraryThing's data came only in the smallest pieces--each user had two books. (That's probably closer to the data Amazon has, at least on the order level, but they have HUGE amounts of it.) If you had libraries of two books each, it wouldn't be too long before some sort of "similars" patterns emerged. People bought Harry Potter 1 an Harry Potter 2, The Purpose Driven Life and the Purpose Driven Church, Dilbert and C++ programming books (trust me on that one). You only need 5-15 similars, and something would pop out of the data a few times. But look at the reverse. It would be FOREVER before you could draw any inferences about books that didn't occur together. There are millions of books. NOT occuring together is the norm, not the exception. It would be like proving negative.

Fortunately, LibraryThing's data isn't composed of two-books libraries. Some of its users have thousands. (It's a tricky question whether coincidence of books in a large library is less significant than coincidence in a small one; v1 of the Suggester algorithm considers this; v2 does not.) Anyway, I haven't tried to understand calculate the "potential for flukes" at the lower end, but it increases as the minimum number goes down. After all, if a book has only five copies in the system, it's not going to happen together with a lot of books that are actually quite similar to it--just by chance.

Does this make any sense? More coffee, Tim? Okay.

11/15/2006 9:34 PM  
Anonymous Anonymous said...

Thanks for the response! I'm not sure it quite addresses my question, though (or maybe I'm the one in need of coffee and more skill at phrasing questions). It seems like you were explaining why a book has to be owned by 75 people in order to generate unsuggestions. And that explanation makes sense; I'm glad it didn't include more technical statistical terms. Is that related, though, to why the list of unsuggestions has 75 items on it, while the list of suggestions only has 15? Sorry if it is and I just completely missed the point... I'll try reading over your post again in the morning.

11/15/2006 11:59 PM  
Blogger Tim said...

Oh, good point. I misread you :) Well, I hope you were entertained.

The reason it has 75(ish) is that they tend to fall into a pattern, and I don't feel too bad about bad ones.

For recommendations, I'm worried about bad ones. A few bad ones are okay. Big swaths of them make the extercise look hollow. In truth, some books work a long way out and some don't even get to ten before looking bad.

In general, LT is pushing things by going way past five. I don't want to give anyone a heart attack.

Sorry to dupe you into readin my long comment ;)

11/16/2006 3:58 AM  
Blogger Matt said...

This is a highly entertaining feature. I blogged about it.

11/17/2006 2:16 PM  
Anonymous Anonymous said...

Oh, it was still a useful response. I think someone else had even asked somewhere about why so many copies were required.

As for my actual question, I guess that's a good reason to have a shorter suggestion list. It's not as if I need more books to read anyway. And I have high hopes of being able to exclude author again someday :)

11/17/2006 9:40 PM  
Anonymous Anonymous said...

This feature's great fun, but I can't help noticing that most of the books I type in generate lists of "christian literature" of various sorts. This suggests to me that there's a big gap between libraries that contain christian literature (and not much else), and those that don't. I've also yet to find a book I liked that *doesn't* generate at least one unsuggest I've also read and liked...

I've still got a long way to go in terms of cataloguing all my library, and given I read quite widely, I'm quite interested to see if it alters the results at all!

(In fact, would it be possible to generate a "conformity number" or some such figure to indicate how close a given library conforms to what one would expect? In other words a library consisting entirely of christian literature or programing books, would have a high number, one consisting of say, Harry Potter, Dilbert, and the Koran, would be very low. That would be cool...)

11/18/2006 9:45 AM  
Blogger Doro said...

I love the idea of a "conformity number"!

11/18/2006 4:19 PM  
Blogger K said...

I wonder how this works when you factor in couples - or families too, I suppose.

Once I've catalogued everything in my house, there'll be quite a bit of classic lit and some programming texts too. My husband reads my classics, but nothing's going to persuade me that I want to read Applied System Operating Concepts. Sorry.

11/19/2006 7:12 PM  
Blogger Tim said...

Well, that's the book you'll be reading in hell.

(That does suggest a new revenue stream--Satan.)

11/19/2006 7:15 PM  
Anonymous Anonymous said...

Don't be evil.

11/19/2006 7:15 PM  

Post a Comment

<< Home