Monday, September 17, 2007

Google Book Search ... on LibraryThing

Introducing something new we're calling "Google Book Search Search."

Google Book Search Search is a bookmarklet that searches Google Book Search for the titles in your LibraryThing library. It works not unlike the famous SETI@Home project. You set it up and searches Google Book Search slowly in the background.* You can watch, do something in another window or go out for coffee.

When it's done you can link to and search all the books in your library that Google has scanned. You'll find a "search this book" link on work pages, and a Google Book Search field to add to the list view in your catalog.

But this isn't just a selfish thing. There's a lot of searching to do, and you can help. If you choose, you can pitch in and help with others' books. All of the data gathered is free and available to everyone. A lot of people want a reliable index of what Google has, not least libraries.

What do I do?

Google Book Search Search is a "bookmarklet." You save it to your "favorites" or "bookmarks." Then you got to Google Book Search and you click it. You can see what pops up on the right.*** Press start and it will start collecting information.

Here it is: Google Book Search Search

We've tested it on FF and Safari on the Mac, and FF and IE7 and IE5.5 on the PC. We haven't tested it on PC IE6 yet. I have no idea about Opera.

Why a bookmarklet?

We've wanted to do this for a long time. But to link to a book on Google reliably you need its Google ID. For some reason Google doesn't publish these, making it impossible to tell what they have and what they don't, and impossible for sites like LibraryThing to send them the traffic they want. Secretive and self-defeating? Seems like it to me.

Efforts have been made to collect Google IDs before. The well-known Lib 2.0 blogger John Blyberg tried, as have others. We tried too. The trick is that Google Book Search—like the rest of Google—has a system in place to stop machine queries.**

Making a bookmarklet distributes the work. And because it takes place within a browser, it tends not to trigger machine-collection warnings.

Ultimately, however, Google can put a stop to this. The bookmarklet has a signature. And Google can send us a note, and we'll disable the bookmarklets. Just as Google respects the robots.txt file, we'll respect such a request.

Why not use "My Library"?

Last week Google introduced an interesting "My Library" feature, allowing people with Google accounts to list some of their books. A few tech bloggers saw an attack on LibraryThing.

LibraryThing members were quick to dismiss it. It wasn't so much the lack of any social features, or of cataloging features as basic as sorting your books. It wasn't even the privacy issues, although these gave many pause. It was the coverage.

Google just doesn't have the sort of books that regular people have. Most of their books come from a handful of academic libraries, and academic libraries don't have the same editions regular people have. Then there are the books publishers have explicitly removed from Google Book Search. Success rates of below 50% were common. Of these a high percentage are only "limited preview" or "no preview."

The Google-kills-LibraryThing meme has another dimension. We WANT people to use Google Book Search. It's a great tool. Being able to search your own books is useful, and LibraryThing members should be able to do it. Call us naive, but we aren't going to be able to "pretend Google isn't there." And we aren't convinced that Google is going to create the sort of robust cataloging and social networking features that LibraryThing has.

Our bookmarklet works by transcending ISBNs, using what LibraryThing knows about titles, authors and dates to fetch other editions of a work. In limited tests I've found it picks up around 90% of LibraryThing titles.

Information wants to be free

Our commitment to open data is long-standing. We've railed against OCLC for its desire to lock up book metadata.

But we're not railing here. We think it's perfectly fine for Google to control access to the scans it's made. All we want to do is link to them, to send them traffic. It's not clear to us that Google is trying to control access to its ID numbers.

You can see and edit the data here. Full XML downloads of the data are also available there.


*Come to think of it, it works like Google.
**The system is overzealous. It often refuses to show me Google Blog Search pages in Firefox because I look at LibraryThing's blog coverage too much.
***It's quite amazing what a bookmarklet can do. We could have never done it if Altay hadn't shown us the way in this sort of Javascript. The script itself is, however, pretty amateurish--a notice attempt at what Altay did expertly.

As we put on the bookmarklet: "Google and Google Book Search are registered trademarks of Google. LibraryThing is not affiliated in any way with Google or the many libraries that have so generously provided Google with their books and bibliographic metadata, although we share a love of books, a desire to make information as freely available as possible, and similar opinions about evil."

Labels: , , ,

40 Comments:

Blogger WorldMaker said...

Not sure if I need this yet (or any time soon)... but if it helps the community at large I'll be happy to start with my own (somewhat sizable collection of) books and see what I can dig up.

9/17/2007 2:13 AM  
Blogger AndrewB said...

That is so nifty! I've just run my meagre collection through it :)

9/17/2007 2:14 AM  
Blogger Unknown said...

Interesting, though I had to read the post a few times to figure out what exactly was happening. So, first thing is first, I've noticed a few mismatches so far. Once this is done, what can be done with the mismatches? Can they be deleted?

9/17/2007 2:15 AM  
Blogger Tim said...

Tom:

Yes, they can. Follow the link to the data. You can edit any of the information. You can't delete a book—it would just get run again, and be wrong again!—but you can zero out the Google Book ID. This will mark it as not found, and it won't be tried again.

You help making the idea clearer would be appreciated. It's hard to wrap your head around, I know!

9/17/2007 2:17 AM  
Blogger Unknown said...

Nice!
question, tho--I started it while I was logged into my own library there. The only option that was available to me was to collect 'other people's books'.

How many will that be--how much time?

9/17/2007 2:32 AM  
Blogger Tim said...

Your own library there or on LT. It doesn't interface with the "My Library" feature. Perhaps it should, but there are privacy concerns.

9/17/2007 2:35 AM  
Blogger Unknown said...

My Google Books library--I had been linking to my own from the comments field in my LT library, manually (only 250 books) is that not a good idea? (privacy wise)

9/17/2007 2:40 AM  
Blogger Tim said...

No, that's fine. The problem comes with the GBSS bookmarklet reading your Google ID name. That would be something we'd want to ask about, and fundamentally we don't want to know your id on other sites.

9/17/2007 2:47 AM  
Blogger rob said...

There are some bad links under the heading 'what do I do'.

And I can't get it to work. I do not see a java window when I enter google books. Using FF on PC and Mac. Are there some more detailed instructions somewhere?

9/17/2007 3:05 AM  
Blogger AndrewB said...

Hi Rob, Tim's accidentally linked to "athena", which I believe is their testing server.

All you need do is right-click the link next to "here it is" entitled "Google Book Search Search" and add it to your bookmarks.

Then, go to Google Books, and from within your bookmarks menu, click the "Google Book Search Search" item (added in the step above). After a few seconds the GBSS window should pop over the top of the webpage and you can make it start doing its thing. :)

9/17/2007 3:54 AM  
Anonymous Anonymous said...

Very nice. I don't think this affects the results but I would expect the number remaining to go down as the numbers found and missed go up (i.e. found+missed+remaining=total in my library), but it doesn't and seems to be the total in my library.

"You can see and edit the data here. Full XML downloads of the data are also available there."
I can't get the link to work for me - it asks for a username/password and it doesn't appear to be my librarything account.

9/17/2007 4:01 AM  
Anonymous Anonymous said...

Regarding my second point:

andrewb's comment pointed the way to http://www.librarything.com/gbss_data.php

9/17/2007 4:05 AM  
Blogger Steve said...

"I have no idea about Opera."

I'm running it in Opera 9.23 (WinXP) right now and it seems to be working okay.

9/17/2007 5:54 AM  
Anonymous Anonymous said...

I'm not sure I understand the use of this whole operation. What additional information will I ultimately get from this "linking with google" about my books?
Anyway, in any case I dislike the idea to have google having access in any way to my catalog - seeing how they deal with privacy issues. I wouldn't like to have to set my account to private because of that ..

9/17/2007 7:08 AM  
Anonymous Anonymous said...

I tried to do this, but the "Search Search" box tells me that I'm not logged into LibraryThing when, in fact, I am. It will only allow me to download book of others. Is this still okay? I'm using IE 6.0.

9/17/2007 8:59 AM  
Blogger Harlan said...

Hi, the tool is working fine, in that the Javascript applet runs and collects data. If I follow the "see and edit data here" link, I can indeed see a list of my books with links to the GBS pages. However, I can't seem to find that link on any of the normal pages related to that book! There's no "Google Book" link or anything on either the book-info or social-info pages for the book. Am I looking in the wrong place?

9/17/2007 9:30 AM  
Blogger Jonathan K. Cohen said...

When I try this on books.google.com, I get a message from the bookmarklet saying "Google Book Search Search only works on Google Book Search: http://books.google.com". Since I am on books.google.com, this is mystifying. True for both IE7 and FF2.

9/17/2007 10:45 AM  
Anonymous Anonymous said...

I confess I'm not entirely sure what it's doing, but whatever it is, it's working (with Firefox 2.0.0.6).

I notice that in the Status Bar the number of volumes remaining is not being decremented as the search proceeds. That's probably a bug.

How will the "Confidence" assessment be reflected in the LT links? Will we be able to remove or correct inaccurate links?

9/17/2007 12:11 PM  
Blogger Rob Szarka said...

Oooo! New toy! Shiny!

Tim: One of the two links to Google Books in your post (the second link) is actually linked to Wikipedia. This might be causing some confusion for folks.

9/17/2007 2:15 PM  
Blogger Mark Barnes said...

I couldn't get this working in Firefox because I was using the CustomizeGoogle extension, which affected the way the LT javascript was scraping the pages. I've turned the extension off for BookSearch, and now all is well.

It won't work on IE7 still, as it says I'm not logged in to LT, even though I am.

9/17/2007 3:16 PM  
Blogger Cheryl Vanatti said...

I am really worried. Now all my books (2340!) are missing. There is some sort of weird code at the top. I may be a nerd, but I'm not a programmer. I can't read that stuff! (using Firefox)

Throwing up now....

9/17/2007 3:35 PM  
Blogger gritmonkey said...

I'm using Safari 2.0.4 on OS X, and the bookmarklet dies at the following line:

TypeError - Undefined value
http://www.librarything.com/gbbs_response.php?t=1190058372794 Line: 106

It was searching for:
isbn:0345305116

if (searchQuerytoTake < bookAA.length)
{
winner = '';
winnerPrint = '';
text = '';

var resultsA = new Array();
var linkA = window.frames[1].document.getElementsByTagName("a");
var results = 0;
var state = '';

9/17/2007 3:52 PM  
Anonymous Anonymous said...

This is frustrating! How do you save the bookmarklet in the first place? Every time I try, it gives me that "only works on Google Book Search" line, and I know that, but I can't even get it to bookmark so that I can GO to Google Book Search... am I missing something?

9/17/2007 3:59 PM  
Blogger gritmonkey said...

The bookmarkelet worked fine in Firefox on OS X.

heather19: what browser are you using?

9/17/2007 4:03 PM  
Blogger Rob Szarka said...

@heather19: I'm using Firefox and I simply dragged the link for the bookmarklet to my bookmarks toolbar at the top of my Firefox window. YMMV and TMTOWTDI (probably), but that's the easiest way assuming your bookmarks toolbar is visible (check under View | Toolbars). Of course, if you're not using Firefox, the first step is "Install Firefox". ;)

I've searched over 800 of my books so far and I'm holding steady at about 90% found. I'm amazed.

9/17/2007 4:41 PM  
Blogger Yvette Hoitink said...

This comment has been removed by the author.

9/17/2007 4:56 PM  
Blogger Yvette Hoitink said...

Hi,

I sense that this is very cool but just don't get it. I got the bookmarklet working in Google Books. It even found some very obscure Dutch books in my collection. I never would have thought Google had these indexed, but there they were.

However, I feel like I'm really missing something. What's the point of this whole exercise? Can I type in a word and see which books in my catalog include this word on which page (o yes please!!!)? Can I link to the book in Google? What would be the use of that since most books have no preview?

I don't see any changes to the pages of the books that the bookmarklet found in Google so I don't know what difference finding the books in Google Books makes to my catalog in Librarything. I'm sure it's pretty obvious but I just don't get it...

9/17/2007 5:08 PM  
Anonymous Anonymous said...

So how do we slice the results by reliability? I want to double-check all the found but not exact edition results...

9/17/2007 5:23 PM  
Blogger Mark Barnes said...

If you're wondering how all this data helps, go to your Library and edit your display files. You can change one of the columns to Google Book Search, which then gives you access to the Google page where appropriate. I guess eventually this link will make its way to other places, but for the moment that seems to be it.

PS - Midway through my reasonably-sized library, Google asked me to confirm I was a human-being by typing in a CAPTCHA. Carried on happily afterwards, though.

9/17/2007 5:33 PM  
Anonymous Anonymous said...

So cool! Changed my default catalogue view, to enjoy the prettiness that is Google.

9/17/2007 6:08 PM  
Blogger Cathy said...

I'm using IE7 and having the same problem as squeakychu - it doesn't recognize that I'm logged in to LT. I left it running (on other's books) while I was gone for the day and it appears that it stopped after three or four hours.

9/17/2007 6:11 PM  
Blogger Rob Szarka said...

I second the motion to add sorting by level of confidence. I saw one go by marked "likely right" that was dead wrong, but then had a devil of a time finding it on the results page to correct it.

(For the record, the work was "A Dictionary of Symbols", which got matched to "Higher Education in a Changing Canada: = L'enseignement Supérieur Dans Un ..."!)

I've done over 1500 books so far, no CAPTCHA.

9/17/2007 7:51 PM  
Blogger Rob Szarka said...

Argh! Better yet, let me display all my results on one page. I just gave up paging through my 2000+ books to find the misidentified match for "The Origin of Species" and correct it... sfzjuc

9/17/2007 8:21 PM  
Blogger Rob Szarka said...

Argh! I'm finding more and more mismatched works and the frustrating thing is that right now it's much harder to get bad data out than it is to put it in.. :(

(And darned if I can see how some of these bad matches happened...)

On the plus side, I've found several cases where my the matched edition of a work is "limited preview" but there's another edition with a full view. :)

9/17/2007 8:51 PM  
Blogger Jonathan K. Cohen said...

Is there any kind of debugging data I can provide to help solve my problem posed above -- i.e., that the error message "Google Book Search Search only works on Google Book Search: http://books.google.com" appears even when I'm on the Google Book Search page? I'd really like to get this running. FF 2.0.0.6, Windows XP SP 2.

9/17/2007 9:53 PM  
Blogger Tim said...

Jonathan,

It relied on the "referrer" (or referer, in HTTP-speak) being books.google.com. I'm guessing your browser security settings were set to avoid sending that.

I've taken the referrer protection off. This means that some users will try to use it on Disney.com and wonder why it doesn't work. Oh well.

T

9/17/2007 10:08 PM  
Blogger Murray said...

In the blog post, the text "Then you got to [[Google Book Search]] and you click it." actually links to the wikipedia article on bookmarklets instead of to Google Book Search.

9/18/2007 4:07 AM  
Anonymous Anonymous said...

The bookmarklet assumes you are logged in at www.librarything.com, it does not check the international sites.

9/18/2007 6:06 AM  
Blogger Keir Hardie said...

interesting - it's a bit overconfident on some 'likely right's though - is it looking for the work name rather than the book name? That will cause some spurious 'exact edition's.

9/18/2007 7:36 AM  
Blogger Keir Hardie said...

oh, and I don't think the difference between 'likely' and 'probably' is clear!

9/18/2007 8:03 AM  

Post a Comment

<< Home