Friday, October 21, 2005

Suggestions, duplicates and yellow rows

Book-by-book suggestions have improved.

Users with lots of duplicates—mostly from bad imports—were slanting things terribly. I knew this when my novelist wife's books—hint! hint!—all came up with Spidering Hacks as the top suggestion. This came about because I have multiple copies of both her novels and Spidering Hacks, and do not represent their content. Screening out duplicates has also been applied to profiles. There are some other places where it needs to be applied. Duplicates will soon be a negligible issue. Hooray, I say.

Why are some rows yellow?

Users with duplicates will notice that some books show up with yellow rows. This is just temporary—in the future you will be able to either show the duplicates this way, show just the duplicates or ignore whether a book is a duplicate. It's as far as I got: "Wallace and Gromit" trump feature addition, at least until later this evening!

16 Comments:

Anonymous Anonymous said...

Yay for Wallace and Gromit! :)

I've noticed the yellow rows and it's been a useful spur to get more comments and data in to those entries. I have multiple copies of quite a few books, mostly 'this one has annotations in', 'this one is in the posh imprint', 'this one is my childhood edition and not for letting toddlers play with', that kind of thing, so I'm hoping there will be a way to get those yellow rows to go away permanently, I don't want them flagged up for ever more. (But it is useful for realising when I've entered three copies of a title instead of two, so thanks.)

10/21/2005 7:54 PM  
Anonymous Anonymous said...

Speaking of duplicates, a lot of my Science and Civilisation series show up as duplicates (it says I have four copies of Science and Civilisation in China: Volume 6, Biology and Biological Technology, Part 6 and that another user has 3 copies).

Since you've explained this is based on title and author, I'm wondering if it might be because the titles are so long that the algorithm might give up before it gets to the volume 6, part 6 part.

Thanks. Liao

10/22/2005 12:19 AM  
Blogger Tim said...

(also posted as comment)

Liao: Actually, it's the colon that's doing it. See the FAQs about how LibraryThing calculates whether a book is "the same," for sharing and other things. In this case, it's seeing the stuff after the colon as a subtitle and thinking it doesn't matter. Usually this info doesn't matter, but here it's tripping you up.

Tomorrow I'll add the ability to show just duplicates. Otherwise, you won't have to worry about the yellow. The FAQs also talk about my plan to allow users to specifically disambiguate the situation. This will be a while coming.

10/22/2005 12:44 AM  
Anonymous Anonymous said...

The whole Social Data thing is a real head-scratcher. Usually an information system wants homogeneous data so you can always identify what's the same and what's different, but here you have a lot of "fuzzy" data from different sources that's mostly the same, or kinda the same, etc. Interesting problem.

One suggestion, since there are a lot of features that seem to use the title and author fields, how about a "data entry style guide" for those fields. That is, a suggested standard way to enter the title of a book, a periodical, etc., and the author, translator, editor, etc. I don't know squat about library science, but I'd suspect there are one (or many) existing styles. If LT suggested one standard then manual entry data could start converging on it (or be ignored, as users often do). Just a thought.

Great job, really enjoy watching things develop.

10/22/2005 8:31 AM  
Blogger Tim said...

I rewrote it so it takes up less memory and regenerated the four broken ones.

10/22/2005 10:50 AM  
Anonymous Anonymous said...

I don't know how to find out duplicates. Please help.

10/22/2005 10:54 AM  
Blogger Ed said...

I'm as hooked on reading the Blog entries as I am on using LT.

I'm in agreement with the earlier comment on 'standardization' of entries. I have older paperbacks that don't seem to appear in search results. I pick one result and modify it as best I can to match my book. I don't know if the various fields that I don't delete or modify still apply.

And what is the proper way to enter serial magazines, such as the 15 or 20 copies of Analog science fiction magazine that I have?

10/22/2005 1:27 PM  
Blogger Tim said...

I just regenerated yours (ready in a minute or two). I need to add the ability to do it from that page, but make sure it can't be done more than once a day. I'm up to my elbows with something else right now, sorry.

10/22/2005 8:19 PM  
Anonymous Anonymous said...

Okay, putting on my professional cataloguing geek hat:
RE: standardizing entries: I've been pretty sloppy in my LT entries compared to what I would do at work but usually habit makes me put in the standard ISBD (International Standard Bibliographic Description) punctuation. For titles, this means "space colon space" between title and subtitle. This is also what you will get when importing records from libraries. So if LT ignores everything after the colon it is going to be flagging lots of duplicates where they really aren't. I know it is doing it to many of mine.

Re: entering periodicals - I've done one periodical entry so far (title is On Spec if you want to have a look). I have a complete run so I did one record and in the publication field I put in the publisher info, then added my holdings (v.1 (1989)- ) leaving them open-ended because I still have an active subscription and I can't be bothered updating LT every time I get a new issue. If I were entering a small number of scattered issues I would enter them on separate records and put the volume/copy info in the title field (e.g. Astounding v.23 no. 6 (June 1962))

If anyone is really interested, I'm quite happy to discuss cataloguing any time :)

10/22/2005 10:37 PM  
Blogger Ed said...

tardis, thanks for the geek speak. I've been sloppy and ignorant!

10/23/2005 12:51 PM  
Anonymous Anonymous said...

Hmm. No Yellow rows for me. They'd be rather useful too.

10/24/2005 9:40 AM  
Anonymous Anonymous said...

Are you aware of the xISBN service? You feed an ISBN# to a URL via GET, and it sends back the ISBN# of other editions of the same work. Seems to work pretty well...

http://www.oclc.org/research/projects/xisbn/default.htm

10/24/2005 9:05 PM  
Anonymous Anonymous said...

About half of my recommendations are books I own but haven't yet cataloged. Many are fairly obvious, e.g. Hitchhiker's Guide to the Galaxy, but a couple were impressive, e.g. recommending James Joyce even though I've entered very little fiction so far (and most of that has been sci fi). Most of the other recommendations are books are I know about and many of them I've either read or am already planning to buy when I find them cheap. There were a couple that I didn't know about but which I almost surely would like, e.g. a Howard Gardner book on cognitive science.

In short, looks like recommendations are off to a good start. :) It will be interesting to see what it comes up with once I have more books cataloged.

BTW, I also encountered one book in there that I own and have cataloged. In this case the title was "different" because the edition I cataloged has no umlaut over the "o" in Godel, Escher, & Bach, while the one recommended does.

10/25/2005 1:59 AM  
Blogger N. Trandem said...

Hey Tim, I've noticed that the Tolkien books tend to get split into two camps by your matching rules. Half have "Tolkien, J.R.R." for the author, and half have "Tolkien, J. R. R.". Not sure if you'd want to change the algorithm or simply implement a fix for that specific case...

10/26/2005 12:18 PM  
Blogger N. Trandem said...

Hmmm... I guess it doesn't have to do with the spaces in the name after all. Why do these two not match?
http://www.librarything.com/card_card.php?referpage=http%3A%2F%2Fwww.librarything.com%2Fcatalog_bottom.php%3F&book=554187
http://www.librarything.com/card_card.php?referpage=http%3A%2F%2Fwww.librarything.com%2Fcatalog_bottom.php%3F&book=554128

10/26/2005 12:29 PM  
Blogger Wm. said...

I find that some of my collections don't indicate the individual titles within, so duplicates are _not_ indicated where they should be.

For example, I have a collection with alll the Douglas Adams H2G2 books, but the individual titles were considered "new" to me.

Any way around this?

10/26/2005 1:54 PM  

Post a Comment

<< Home