Wednesday, February 08, 2006

LCCN Numbers now parsed

Before the crash I did complete one new feature, the accurate parsing of Library of Congress Catalog numbers.

LCCNs are slippery things, without a fixed number of digits or a single internal structure—"89-456," "89-7890," "2001-1114" and "gm 71-2450" are all perfectly okay. For this reason, LibraryThing was sometimes mistaking them for ISSNs and ISBNs, helpfully adding hypens and checksums digits to reinforce its misundersanding. Worse, the Library of Congress' online catalog doesn't allow searches by printed LCCNs. Instead, you have to turn them into a special machine-readable LCCN format, adding leading zeroes and removing hypens and spaces as they dictate (see here.)

Anyway, I worked through the tangle, and LibraryThing now handles LCCNs well, identifying them as such and converting them to the format the LC's requires*. I can tell you from entering 180 of my own books this week, that LCCNs come in very handy with older books, many of which have their codes printed on one of the first pages or even the back cover. Of course, I lost all those books. But I gave myself a free membership, so we're even.

*The one exception are LCCNs with "alpha prefixes." I never found one of these in the books I cataloged, but the LCs catalog page says they exist. To parse them, I need to know the range of possible prefix. Would, say, 1-4 alphabetic letters plus a space cover it?

18 Comments:

Anonymous Anonymous said...

Great job. This will certainly help with older books as you say.

You know, the LC authority data, unlike the book data, is available for free download:

http://authorities.loc.gov/

This could be a high-powered whiz-bang back end to LT's author functions (such as author selection, author merging, etc.).

2/08/2006 2:55 AM  
Blogger Tim said...

Thanks for the tip. I admit I'm a little at a loss with authority searching. I have to have a conversation with Abby, the person I talk to about such issues.

How exactly could LT use this. Can you unroll this for me?

2/08/2006 3:07 AM  
Blogger Cameron said...

Some LCCN links for you:

* Structure of the LCCN
* LCCN Restructuring for Y2K

2/08/2006 9:17 AM  
Anonymous Anonymous said...

Nice! I'll have to try this out soon, with the books at home.

The only LCCN prefix that I've come across (because it's the only type of material I work with) is one letter: m. This is for music scores.

As far as authority data goes, if there's a way for user searches to be compared to the authority data, that would be a good first step. It's used mainly for authors and titles. For example, if someone types in "Shakspere, William," the program checks it against the authority record and then asks if s/he means "Shakespeare, William."

2/08/2006 9:32 AM  
Blogger Christophilus said...

Thanks again for all the hard work you've been putting in! I know I, for one, appreciate it immensely. This being said, I hate to bother you, but the problem I reported a few days ago about sorting by sharedness being out of whack is still out of whack. This isn't terribly important, especially not in the grand scheme of things, but I figured I would let you know that it's still out there. I can't wait to start trying out the LCCN numbers as I have a number of books myself languishing on my shelves uncatalogued because of a lack of ISBN. Thanks again for providing such a great service in such a fun package.

2/08/2006 9:45 AM  
Anonymous Anonymous said...

. . . I have a number of books myself languishing on my shelves uncatalogued because of a lack of ISBN.

But you've never actually needed an ISBN to catalog a book! I've catalogued a lot without ISBNs, just by entering search terms such as the title and author's name, and then verifying edition information. It doesn't really take much longer, since I always had to verify the information pulled from the ISBN anyway (I'm a very distrustful sort).

2/08/2006 11:35 AM  
Blogger Tim said...

On authority records:

Unless I'm mistaken, the authority records don't help out that way. For example, I typed "Shakspere, William" and I actually got "Shakspere, William, 1564-1616." Yipes. Even if I did Shakespeare, what useful info comes from that? The date, I suppose.

2/08/2006 11:46 AM  
Anonymous Anonymous said...

re: on authority records -
Adding dates to names in authority records is part of distinguishing multiple authors with the same name. Maybe Shakespeare isn't the best example (have there been multiple William Shakespeares? I don't think so), but in a real library catalogue it's extremely useful. I just counted 8 authority records in my catalogue for "Smith, John" - no initials, no middle names so the dates are the only way to distinguish one from another.

2/08/2006 12:48 PM  
Blogger Tim said...

No, don't get me wrong. I'm receptive, but I don't know how to deploy it. Here's my thinking:

You're right that LibraryThing should start disambiguating authors with the same names. I wish I could think of a good example within the collection, but a bad example is Tomas Wolfe, who eats some of Tom Wolf too. ( http://www.librarything.com/author.php?author=wolfethomas ). In this situation, the user could click some sort of button to divide the author, and then allocate where the books go.

The division needs a basis. Apart from middle initials—where this helps and people have some idea of the right answer—the simplest basis would be the books themselves ("Thomas Wolfe, the author of Look Homeward Angel").

Using works has the economy that appeals to a computer guy, worried about adding too many layers between realia. This is also the most normal and natural way for most people. When you're at a dinner party, and someone announces that Thomas Wolfe is their favorite author, do you ask "Oh, Thomas Wolfe 1900-1938 or Thomas Wolfe 1931 to present?" No, you say "Do you mean Bonfire of the Vanities Tom Wolfe?"

Dates, the system libraries choose, is another good option. These could either be user supplied (perhaps with a lookup on Wikipedia), or based on a quick LC authorities check. If the latter, the user would click on "split author," LibraryThing would do a query of the LC and come back with a series of possible name-date combinations. The users would choose the two that apply.

My only reservations are the programming involved—authority data is not presented through Z39.50, so I'd need to "screen scrape" the Library of Congress web site—and the fact that the LC authority data is not perfect either, witness Mr. Shakspere.

2/08/2006 1:25 PM  
Anonymous Anonymous said...

Tim--

You need to go one step further on the authorities page on the LOC site. For the entry "Shakspere, William, 1564-1616", you need to click on the "References" button to the left, and you'll see that it says "See Shakespeare, William, 1564-1616."

Using the authorities for disambiguation of names, etc. is the best use for the data. It also works for items that have been translated from other languages. For example, you can group together Dickens' "Tale of Two Cities" even if one is in the original English, and one is a French translation titled "Conte de deux villes."

2/08/2006 1:59 PM  
Anonymous Anonymous said...

Whoa...

Was adding commentary, reviews and the like, clicked "submit" button and got this:

SELECT COUNT(1) As numbooks FROM bookstack AS a1 WHERE a1.books_userid = 'appaloosa' AND ( a1.books_public or a1.books_userid = 'appaloosa')Table './librarything/bookstack' is marked as crashed and should be repaired

*Yikes*! Not sure what it all means, (have I lost everything I added??).. but I know that you'll fix it :-)

Cheers,
~app

2/08/2006 2:57 PM  
Blogger Uncle Rameau said...

Getting a fatal error message where I had had no problems at all before.

and same select count error, as well.

apprehensively yours,

Sluggo

2/08/2006 3:12 PM  
Anonymous Anonymous said...

did you know that the feed for this blog goes down with the database? is it possible to exempt /atom.xml from the redirect?

2/08/2006 3:33 PM  
Anonymous Anonymous said...

LibraryThing is temporarily offline

3:00 AM. LibraryThing is down for a moment while I assess something.


Thank you for all your hard work!

I would like to make a small but impassioned plea, though. When you take down LT, can you please provide the date - and (bonus) maybe mention the time zone along with the time?

2/08/2006 3:48 PM  
Blogger Uncle Rameau said...

better now, and way fast since the server upgrade, btw.

2/08/2006 6:52 PM  
Anonymous Anonymous said...

and the fact that the LC authority data is not perfect either, witness Mr. Shakspere.

But Mr. Shakspere (a/k/a Shakespeare, Shaxpere, Shackespere, Shackspear) and his contemporaries weren't particularly consistent in the spelling of his name, so why should we expect the LC to be?

And, of course, "a rose by any other name would smell as sweet." (Sorry, but someone had to quote that and it might as well be me!)

2/08/2006 7:22 PM  
Anonymous Anonymous said...

But Mr. Shakspere (a/k/a Shakespeare, Shaxpere, Shackespere, Shackspear) and his contemporaries weren't particularly consistent in the spelling of his name, so why should we expect the LC to be?

But the point is that 99% the combining of author names and spellings and initialisms that people spend time doing on LT has already been done in the LC authority database (as well as countless foreign translations of English names too).

If you look a the full authority record for William Shakespeare you'll find more than 30 spellings included (note that these are just spellings under which the works have been *published*, not variant spellings that exist). That listing is what LibraryThingers ought to be able to point to as an authoritative source for merging names (and that's why it's called an authority file). Now that doesn't mean there might not be more, or that the LC record might not be complete; it just means it would be a huge time save and great value added to the LT database. A simple button to "Check against LC name authority" which would download the name record to the LT database and let people use it as a standard for mergers would be great. It would raise LT to a new level of accuracy.

2/08/2006 10:08 PM  
Anonymous Anonymous said...

Ah, it doesn't seem to like the link I made. Here is most of the full record:

LC Control Number: n 78095332
Cancel/Invalid LCCN: sh 85120820
LC Class Number: PR2750 PR3112

HEADING: Shakespeare, William, 1564-1616

Used For/See From:
Shakspeare, William, 1564-1616
Šekʻspiri, Uiliam, 1564-1616
Saixpēr, Gouilliam, 1564-1616
Shakspere, William, 1564-1616
Shikisbīr, Wilyam, 1564-1616
Szekspir, Wiliam, 1564-1616
Šekspyras, 1564-1616
Shekspir, Vilʹi︠a︡m, 1564-1616
Šekspir, Viljem, 1564-1616
Tsikinya-chaka, 1564-1616
Sha-shih-pi-ya, 1564-1616
Shashibiya, 1564-1616
Sheḳspir, Ṿilyam, 1564-1616
Shaḳspir, Ṿilyam, 1564-1616
Syeiksŭpʻio, 1564-1616
Shekspir, V. (Vilʹi︠a︡m), 1564-1616
Szekspir, William, 1564-1616
Shakespeare, Guglielmo, 1564-1616
Shake-speare, William, 1564-1616
Sha-ō, 1564-1616
Şekspir, 1564-1616
Shekspir, Uiliam, 1564-1616
Shekspir, U. (Uiliam), 1564-1616
Šekspir, Vilijam, 1564-1616
Ṣēkspiyar, Viliyam, 1564-1616
Shakspir, 1564-1616
Shekspyr, Vyli︠e︡m, 1564-1616
Şekspir, Velyam, 1564-1616
Ṣēkspiyar, Villiyam, 1564-1616
Shēkʻspʻiyr, Vlilliam, 1564-1616
Ṣēkspiyar, 1564-1616
Ṣēkspiyar Mahākavi, 1564-1616
Ṣēkspiyar Mahākaviya, 1564-1616
Sheḳspier, Ṿilyam, 1564-1616

2/08/2006 10:15 PM  

Post a Comment

<< Home