Saturday, December 31, 2005

Harry Potter and the Spiral of Death (Note: 3AM downtime)

Sorry for the slow-down in the last 36 hours. There was a database issue. Recommendations weren't being cached right, so it had to remake them every time. That took a lot of processing power. For example, to produce a recommendation for one of the Harry Potter books it had to retrieve and do math on the libraries of nearly half of of all LibraryThing! And when a page doesn't come up immediately many users hit refresh over and over—the spiral of death.

I fixed the immediate problem, but it will be a little while before the cache is full again. I'm going to fix the larger problem by adding a second, "thinking" server, that will get a new copy of the book data ever night and sit around all day thinking about recommended books, related tags and so forth. LibraryThing's slowness—when it's slow—is all about these tasks. Looking at your catalog, adding books and so forth don't tax it much. I also like the idea of a server that sits around all day thinking about books. It might even develop opinions.

Finally, I'm going to make some more tweaks at 3am EST tonight. This means that people in California will not be able to add books while watching Dick Clark's New Year's Rockin' Eve. For this I apologize.

15 Comments:

Anonymous Anonymous said...

It just occurred to me to ask, what's up with the suggestions? I'm still getting "Fatal error: Allowed memory size of 8388608 bytes exhausted (tried to allocate 16 bytes) in /home/virtual/site8/fst/var/www/html/profile_suggestions.php on line 325". It's been weeks. What's up?

12/31/2005 8:06 PM  
Anonymous Anonymous said...

This means that people in California will not be able to add books while watching Dick Clark's New Year's Rockin' Eve.

Waaaaaahhhhhh!!!!!!

Amazing how you knew I was planning on doing this. :-)

12/31/2005 8:24 PM  
Anonymous Anonymous said...

I'd be glad to hear Tim's thoughts on the future of the call number field in LT. This is a pre-existing uber-complex tagging system that some of us would love to be able to take advantage of. One basic problem relates to sorting, since there is irregularity in how the fields are populated (QH375 vs. QH 375), and since the sorting rules are actually a bit complex (main classes alpha-numerically, cutters alpha-decimally, if that's a word, etc.). I certainly wouldn't be averse to going through my own catalog and regularizing them, but if the sorting routines are going to eventually take care of it for everybody then that wouldn't be necessary. (Or worse yet, if an auto-update of the catalog data were to overwrite the custom revisions. Is the auto-update feature gone forever?)

Many thanks.

12/31/2005 9:31 PM  
Blogger Tim said...

RJohara: My thoughts are this.

I think that the sorting issue is solvable, but it will certainly take some work.

The larger issue is getting the data for non-library books. Most users get their data from Amazon. Not only is it the default setting but most of the time it's also faster. (Does anyone know if I can get a CD--or 50 CDs--with all of the LC's data? I'd think they'd make it public that way too.)

There are various ways of approaching the issue. I could give each work a "Platonic" record, with LC data already applied to one of the books, or drawn from them via Z39.50. Or I could keep the field editable. It's a tough one, and, I think, logically posterior to solving the edition problem. I have a plan there—edition disambiguation in users' hands, just as author disambiguation is. But I can't forsee all the consequences yet there.

12/31/2005 9:43 PM  
Blogger Tim said...

Lilithcat: I'm working on it now. Only a minority has that issue—you have so many books that the act of thinking about them upsets it...

Tim

12/31/2005 9:44 PM  
Blogger Tim said...

Lilithcat: I'm going to need to wait a while to fix your suggestions. They should probably "bubble up" from the book suggestions rather than being calculated from people you share books with. It's on my list—I'm working on such algorithms as a soon-to-come post will explain.

12/31/2005 9:51 PM  
Anonymous Anonymous said...

I feel like I should start off with a huge *thank you* for creating this thing in the first place.

That said, I notice that most people talking about call numbers are talking about LC numbers; I've been told repeatedly that for smaller libraries, Dewey still makes the most sense. I like being able to grab those automatically whenever possible, so I can label and sort my library that way as I gradually re-assemble it from this last move. Right now, when one's missing I'm searching my local library-and-connected-libraries' databases to see if I can't find a call number and adding it manually. (1) Is there a better source that you know of? (2) Any chance that Dewey numbers will be becoming more common? (3) Can you, if it would be helpful to you, grab the Dewey numbers from my (or others') existing edits?

But regardless -- Thank you! This is a wonderful tool!

(Oh, and I'm TwoFountains here on LibraryThing)

12/31/2005 10:22 PM  
Anonymous Anonymous said...

Most users get their data from Amazon.

Aha - see, I never would have guessed that. For me, the whole thing that makes LT so great is its ability to draw from the Z39.50 databases, and so 95% of my records come from one of the available public catalogs; for me an Amazon record represents a failure (of sorts) to find a real catalog record.

Edition control is a challenge, and one of the virtues of the real catalog records is that they can to some extent help with this - the different "manifestations" of a book (as catalogers call the shadows on the cave wall) are usually designated either by LC catalog number or a variant call number with a date, etc. But even there, it is a challenge. (Try finding a precisely matching record for any of the Loeb Classical Library volumes; ugh. But then essentialism doesn't apply to the Real World anyway.) Do you have a copy of AACR2?

I think as they develop their catalogs further, Thingamabrarians (my word) will start to discover the possibilities that inhere in the existing call number system (Dewey or LC). A powerful measure of concordance (or discordance) among different people's catalogs is overlap (or lack of overlap) between main call number classes. And they're automatically hierarchical: I've got lots of Q's, most of which are QH, QE, and QL, and within QL, mostly QL600's.

(And for TwoFountains: most of what I'm saying will apply to Dewey also, and that's a perfectly good choice. In LT, if you select the card icon to view the full MARC record for a given title, you can often find the Dewey number buried in there somewhere, even when LT doesn't automatically display it.)

12/31/2005 10:54 PM  
Anonymous Anonymous said...

rjohara - using amazon represents a failure to me too! i've gotten stuck with it many times but i've actually gone through and filled in the LoC numbers by hand for almost every single book -- the call number may not be a hundred per cent correct for that edition, but it still gives me a whole lot of information, which i really wouldn't want to do without. i, too, can't believe most people are strictly using amazon.

and tim - i hate to be a pain and mention this again, but i think it would be really super cool if the zeitgeist page reflected tag combinations, like for the top 25 tags or whatever. obviously if people object to this i'll get over it, but since tag combinations are supposed to be global i thought it'd make sense. and i like the idea people have been mentioning lately about making single books or tags private.

-nperrin

1/01/2006 1:41 AM  
Anonymous Anonymous said...

Being as my Daughter calls me PC Challanged and suffering from double click syndrome How do I remove multiple entries,of the same title.

1/01/2006 8:17 AM  
Blogger Dennis said...

Go to your catalogue, find the offending entry, and pluck it out ... I mean click the red X in the right hand column.

It's really that easy.

You may want to sort by title first and cycle through to make sure you catch them all.

1/01/2006 1:47 PM  
Anonymous Anonymous said...

Regarding "a CD with all the LoC's data" - certainly the British Library makes the BNB available on CD.

I don't know if LoC does something similar, but I would be surprised if they hadn't done so in the recent past.

The two problems are a) cost - it's intended for institutional users and priced accordingly - and b) possible licensing issues over the reuse.

1/01/2006 3:16 PM  
Anonymous Anonymous said...

Does anyone know if I can get a CD--or 50 CDs--with all of the LC's data? I'd think they'd make it public that way too.

All the computerized catalog data at LC is available through the LC's Cataloging Distribution Service, which is a huge agency in itself. These are the same folks who started distributing printed catalog cards to libraries all across the country about a century ago, making high-quality data available to thousands of small public libraries for the first time.

Unfortunately, this stuff is aimed at institutional subscribers and a lot of it is very expensive. I bet over time these prices will drop dramatically (they still have a hard-copy and/or mainframe-tape subscription model in mind), but that doesn't help in the present. I see that as of 1 Jan 2006 (today!) most records are being made available in Unicode and XML. It looks like the "Books All" database for 1968-2005, with 8.2 million records and weighing 6.3GB, will set you back $22,000. (Only 1000 lifetime LibraryThing subscribers.)


I'm sure there are good reasons for you to fine-tune your Dewey decimals or LofC numbers, but I suspect you're in a minority.

You mean as opposed to people who enter thousands of personal tags that duplicate information already contained in underlying record? Or who go back and forth about whether GLBT is the same as LGBT? Talk about the tweed set! ;-) Come on, we're all a minority here--do you think the Big Wide World cares about any of this? One would hope that people interested in books would be open to learning more about them and would realize that many of the things people are "discovering" when they start creating their own subject headings (which is what the tags are) have been discovered by many people before and have been thought about and written about extensively. (Tim's "combine tags" feature is a very clever implementation of what's called "authority control", a big topic in library science. I suspect it's the first implementation of this outside a narrow group of specialists.) This isn't to say we can't or won't come up with new and interesting things that haven't been considered before--I'm sure we will, and LT will become whatever Tim wants it to become. For some, it will always be "Friendster for books," and that's great; but latent in the data that LT already uses is a set of exceptionally rich possibilities. Once they start opening up people will find all sorts of cool things to do that they haven't even thought of yet. (Maybe just by setting LC rather than Amazon as the Add Books default, some of these ideas will start to emerge?)

RJO

1/02/2006 12:01 AM  
Anonymous Anonymous said...

I hope this discussion about tagging/standard classification systems can be made more visible to the wider LT community. Tim, would you write a post about it?

And, since I'm here, I will again advocate for a both/and approach, that encompasses tagging (flexible, fun, fast, distributed) and a firm relationship with existing LC/Dewey subject headings (consistent, capable of being leveraged). There is nothing wrong with either approach--they both have entirely valid and important uses.

But it would be a shame if one were developed at the other's expense. What seems to be happening lately is that tagging is being developed at the expense of the ability to relate one's collection to existing classification systems, and that is worrisome to those of us with a need to maintain that relationship.

Of course, none of this would come up if Tim hadn't created this fabulous tool in the first place!

1/02/2006 2:27 PM  
Anonymous Anonymous said...

And, since I'm here, I will again advocate for a both/and approach, that encompasses tagging (flexible, fun, fast, distributed) and a firm relationship with existing LC/Dewey subject headings (consistent, capable of being leveraged). There is nothing wrong with either approach--they both have entirely valid and important uses.

Absolutely! I think tags are great, and one of the most interesting uses I see for them is to reveal connections that are not currently reflected in the existing subject headings. This might even encourage LC to supplement/revise its existing structures. Standardized systems like the LC headings will always lag behind the moving front of ideas, and LT has the potential to suggest ways the LC headings can develop in the future.

Two examples: I have a special interest in Oxford/Cambridge-style residential colleges within large universities. I've accumulated a collection of residential college histories that is probably the only such collection in the world (described here also). There is a lot of interest in the residential college idea in higher education, but this body if literature isn't readily retrievable. A prime candidate for a new tag/subject heading.

As another example, one of my areas of interest is the comparative study of the historical sciences, from geology to philology to evolution. I haven't created a tag for this since it's a large part of my whole collection, but there are lots of tag possibilities that could capture relationships that aren't reflected in existing subject classifications. Some years ago I put together a small commentary on the existing LC headings for the historical sciences that illustrated how they break up important interconnections.

Of course, none of this would come up if Tim hadn't created this fabulous tool in the first place!

Hear, hear!

RJO

1/02/2006 5:54 PM  

Post a Comment

<< Home