Thursday, October 20, 2005

Schedule maintenance and books you should own...

LibraryThing will go down for scheduled maintenance tonight at 1am Eastern time (6am GMT and, alas, 11pm in California). I expect it to be down for 2-3 hours.

Second, if you got this far, you're in a very small minority of LibraryThing people—the cream of the cream, perhaps. So, here's the scoop:

I've got an algorithm that tells you what books you "ought" to own. Basically, it looks at people who have similar books, and figures out what books they have that you don't, adjusting for how close their library is to yours and for how common a given book is generally (the Harry Potter effect). If you want to look at your list, email me. NOTE: TELL ME YOUR USER NAME. I CAN'T READ MINDS!

The list is by email only for two reasons. First, it currently takes about five minutes to create, without breaking the server. (I'm taking the servers down tonight in part to speed such algorithms.) Second, I need feedback before I put the algorithm up.

It's a lot harder to write a good library suggestion algorithm than I thought. If you like thinking about algorithms, this is an interesting one to think about.

The current algorithm has some flaws. First, it tells you about popular books you are actually avoiding. Thus, my brother isn't a fan of Roger Zelaney, but his sci-fi heavy bookcase when matched to other sci-fi bookcases tells him he ought to own them. Second, it doesn't think about different categories of books. Everyone's library has more than one special section, but a "democratic" algorithm favors the largest section. So, I have a special interest in Greco-Roman divination, but it would never suggest books on that topic because my divination section is dwarfed by my other sections.

I have a number of other algorithms to look at. I'd like to test Dewey clusters (popular books in a Dewey-number range that you have a lot of books in), library suggestions that "bubble up" from book-by-book suggestions, and so forth. I'm not too interested in algorithms based on user ratings. My belief is that, in the aggregate, a library is a fair representation of a given person's likes and dislikes. Even a "bad" book should inform the algorithm—people don't buy books randomly.

The goal is to produce a better selection engine than Amazon has. Think big, I say.

25 Comments:

Blogger Dodi said...

Sounds like an excellent feature. I'll wait until I have all my books entered. By then it might be standard.

I know LibraryThing is focused on books, but have you given any thought to a Movie branch? They also have ISBNs and I consider them part of my library.

10/20/2005 2:41 PM  
Anonymous Anonymous said...

Like Dodi, I'll wait until I've got all my books entered - at the moment I have maybe 1/3 of them (maybe less). But it sounds like a fascinating idea! And yes, by all means, a better one that Amazon, which can after all only recommend based on what you bought at Amazon, not on what you've already got from everywhere else.

10/20/2005 2:56 PM  
Blogger oncoffee said...

i'm another one who has started but is by no means finished adding my library, i am less than 25% done, however if you want to run it for me for feedback, i'm game

10/20/2005 3:55 PM  
Blogger Susanne said...

This does sound interesting.

I have a question, and I don't know where else to put it. Is there "help" feature? I noticed when I uploaded my ReaderWare library it only picked up the books that had barcodes. There are a substantial number that I entered by hand, and I don't know how to import them without having to retype all the information.

10/20/2005 3:56 PM  
Anonymous Anonymous said...

It sounds interesting, but at the risk of sounding like an echo, "I'll wait until I have all my books entered".

Amazon's recommendations are good for their amusement value and that's about it. I am currently trying to understand why the fact that I own Patience and Fortitude would lead them to think that I would want a textbook on cyberlaw.

10/20/2005 4:25 PM  
Anonymous Anonymous said...

I'd be curious to see my rbooks I shoudl own list.

10/20/2005 5:48 PM  
Anonymous Anonymous said...

I'll email you, I'd love to see what it makes of my (incomplete but big) list!

On another note: it occurs to me that it would be good to have a way of skipping to the end of results on the import page - when I put in enough detail to narrow a really popular title sans ISBN down to 'just' 120 results (from Amazon or a library), it is tedious to have to hit 'next page' over and over, particularly when I know that my fairly old, dilapidated edition is going to be near the end of the results.

10/20/2005 6:11 PM  
Blogger Unknown said...

(Quote ["])It's a lot harder to write a good library suggestion algorithm than I thought. If you like thinking about algorithms, this is an interesting one to think about.(Close Quote ["])

This sounds like the Turing Test written by the late great Alan Turing.

Can a logical mathamatical engine really steer me in the right direction to find a series of books
that I should acquire?

Who knows, mabey this artifically intelligent bookseller might be human afterall......... :+) or :-(

10/20/2005 7:01 PM  
Blogger Tim said...

Algorithms based on user-generated content can be very powerful indeed—witness Google. LibraryThing's algorithm is like Google in that it considers owning a book to be a vote for the book, except that it weighs votes toward libraries that are similar toward to yours. (Otherwise, the top books would be the suggested books.)

The comparison with Turing is an interesting one. The system doesn't need to show signs of intelligence in response to any query, but take structured data (a library) and give back another set of structured data (a list of suggestions). With 540,000 books in the system, there's no question that LibraryThing knows about a lot of books that would interest you. The question is, can it pick them?

The Turing-like test would be: could a bookseller who only knew your library do better than a statstical algorithm? I'm betting the present algorithm is beaten by a person, but that it would still come up with some ideas the person did not.

10/20/2005 7:14 PM  
Blogger Tim said...

I didn't even know Amazon had a "books you own list." I'll look into it.

The fall of them--competing with LibraryThing! Sheesh. If this continues, I may have to buy them.

10/20/2005 7:16 PM  
Blogger Tim said...

Arg: gall.

10/20/2005 7:17 PM  
Anonymous Anonymous said...

I'd be interested in seeing this... and helping science lol

Username: childofchaos

10/20/2005 7:18 PM  
Anonymous Anonymous said...

I understand why you don't want to deal with user ratings of books for this algorithm- but one thing that might improve it is letting members prioritize their tags.

To use your example, you have a special interest in Greco-Roman divination, but the algorithm at this point will ignore that tag because it has so few members. In many cases (and probably for your example) the number of books with a given tag is not really a measure of its importance to the cataloguer, but rather the relatively small universe of books on that topic. If members could pull up a list of all their tags and order them 1,2,3... by strength of interest- passion for the topic- then the algorithm could give more accurate weight to highly-valued tags regardless of the number of books bearing each tag. It would require another table- but do you think it might address one of the limitations you describe?

10/20/2005 10:48 PM  
Anonymous Anonymous said...

Having seen my list, I'd say that though it pinpoints excellent suggestions - including a lot of books on my mental and Amazon wishlists - it also contains a number of books I own (and have catalogued on LT), plus several I'm avoiding (Harry Potter, say!).

Being able to say "I own it (in a different edition or an anthology)", or "I'm not interested", as on Amazon, would help refine it a lot. But it's already more focused and accurate than their suggestions tend to be. Plus there were, indeed, some unexpected and interesting titles. No question, you can beat 'em!

Thanks, Tim!

10/20/2005 10:57 PM  
Blogger rocketman58 said...

"Ought" to own, you can't be serious. Since when did a books popularity make it worth reading ?

And for that matter, at what point do we decide a book has become " popular". A whole lot of people read Hitler's 'Mein Kampf' and we can look back and see how that influenced some people. Also, the number of copies published may determine weather a book even gets a chance to become popular. Be very careful what you read, its going to go into your brain...and then you have to deal with it. Thank GOD for the Christian bible. Truth and comfort. You can always tell who the truly educated people are, by what they know about Christianity and history. Here's a "popular" title for you to read..."What if Jesus Had Never Been Born?". By Dr. D.James Kennedy. I dare you to read it. I double dog dare you.


Or, how about this one , "How Should We Then Live? The Rise and Decline of Western Thought and Culture: by Francis A. Schaeffer." and one of my personal favorites "Mere Christianity: by C.S. Lewis". You remember the saying don't you," So many books, so little time". Encourages one to be very picky, doesn't it? When you look at Harry Potter, and compare it to Tolkien or Lewis's works, its almost laughable by comparison. C.S. lewis's - "Chronical's of Narnia" is coming out on film soon. Ought to give " Harry" a run for his money. I wonder if it will be "popular" with the kids. All I can say is "bout time". A little more honer and self sacrifice will be good. Good fantasy, bad fantasy. At least we have the freedom to read what we want, popular or not. Is that off topic ? I just never know.

10/20/2005 11:43 PM  
Anonymous Anonymous said...

I'm curious to see what mine might show... even given that I'm no where near done entering things. I do have a great many of my favorites on board... but then, the (seeming) randomness of mine might just confuse it... who knows?

10/21/2005 12:22 AM  
Blogger Tim said...

1. The system screens for raw popularity. Only popularity within libraries like your library matters. Popularity outside of these libraries counts negative.

2. Popularity is not completely unimportant. Should you consider reading the New Testament, Dante and Dickens because lots of other people have done so? Well, in my opinion, yes, you should.

3. Mere Christianity is a very popular book on LibraryThing. Mein Kampf is not.

4. The book police are on their way.

10/21/2005 12:23 AM  
Anonymous Anonymous said...

Tim, three points for points 2) and 3). (And thanks for no. 1!)

10/21/2005 1:04 AM  
Anonymous Anonymous said...

I don't have that many books in here at the moment, but I figure why not, I'd like to see what an algorithm would suggest for me.

ursula is my username here at LT.

10/21/2005 1:52 AM  
Anonymous Anonymous said...

Though my library is not complete, I'm curious as well.

Aside from that, however, have you seen this site?
http://www.whatshouldireadnext.com/
I've played with it before, and while I don't always like what they suggest, it's still fun for ideas.

10/21/2005 2:45 AM  
Anonymous Anonymous said...

1. Books that are actively being avoided:

- Genres you stopped reading:

Unfortunately, at the time you notice that you haven't been reading as much fantasy as you used to your library still looks very much like the library of someone who reads a lot of fantasy, and recommendations based on what people with similar libraries like will probably reflect that. Over time, as your library (and the system) grows, your recommended reading list will start to look more like what-you're-reading-now and less like what-you-used-to-read.

- Sub-genres you don't like:

As the system grows and more people enter more of their books, there will be more "sf-but-no-cyberpunk" libraries out there; the cyberpunk recommendations will eventually fall off the bottom of the list.

- Authors you don't like:

two signs of potential author-avoidance:

i. the author is popular among users with similar libraries (many other users, several** titles), and there are *no* books by that author in your library;

ii. the author is popular etc, and only one or two books by that author are in your library.

** just to pick a number, say 5 or more in the suggestion list, or 8 (10? 15?) or more titles by that author in the database

I think the second is a (slightly) stronger avoidance indicator; both should lose points, but books in the second category should probably lose more...

-----------

One way to minimize the impact of books-being-avoided on the suggestion list (and generally clean the list up) would be to only list the most popular title by a suggested author, with the number of other books (if any), like:

9. Nine Princes in Amber (+12) by Roger Zelazny

It's still technically suggesting 12 books by a writer you don't like, but it's only a single "wrong" suggestion. :)

10/21/2005 5:33 AM  
Anonymous Anonymous said...

Following Diva, if the intent of this scheme is to reveal new books then sheer popularity should count *against*; if you read a lot of fantasy but you don't own any Harry Potter, it's not because you haven't heard of it yet, it's definitely active avoidance.

10/21/2005 7:07 AM  
Anonymous Anonymous said...

2. Categories/special interests:

Rather than trying to get the general suggestion list to reflect special interests, what about making category-based suggestions a seperate feature?

Two ideas for tag-based suggestions:

i. recommend books based on your tags:

given a list of tags (and tags to exclude?) and a minimum number of tags to match (any/n/all), find people who also have the books you've labelled with those tags.
more strict: for each of those people, get the tags they've used on those books, and add in all the other books they've labelled with those tags; run the suggestions using these sub-libraries.
less strict: run the suggestions against their entire libraries.

ii. recommend books based on general tags:

given a list of tags and a minimum number of tags to match, find people who've used those tags.
more strict: for each of those people, get the books labelled with those tags, run the suggestions against sub-libraries.
less strict: run the suggestions against entire library.

10/21/2005 7:34 AM  
Anonymous Anonymous said...

Very cool! User name = sikchi. Thanks!

10/21/2005 4:51 PM  
Anonymous Anonymous said...

Is a LibaryThing catalog *really* a list of "books that person owns"? I submit that for many it is not, and is likely to become even less so as folks find more creative ways to use it. For example, I enter books I *want* to buy and tag them "wishlist". I've considered using LT to track all books I've read and tagging the ones I don't own with something. I see some users using LT to catalog their videos and sound recordings.

I'm not sure what effect this has on recommendations. If I happen to own Jonathan Livingston Seagull because I just found it used for $1, but no longer own On the Road because I "lent" it to a friend 15 years ago, is it a good idea to count the former but not the latter? I don't think so, but if I use LT only to catalog books I own, then that's what would happen.

All of which is somehow related to my suggestions via email about adding an "ownership" field.

Another thought about how to use LT to connect users with books and people they should know: take a look at the "groups" feature at last.fm. (E.g. my Does Humor Belong in Music group at http://www.last.fm/group/Does%2BHumor%2BBelong%2BIn%2BMusic) Music listening being a bit different than book reading, the charts there are focused on what folks are currently listening to; but I suspect groups would be even more useful for LT, where a catalog is essentially an archeological dig through a user's reading history.

10/25/2005 1:08 AM  

Post a Comment

<< Home