Monday, April 24, 2006

LibraryThing adds language support

LibraryThing now allows you to keep track of the languages in your collection. If you don't want to do that, you don't have to. If you do, the changes are far-reaching.
  • Every book has three fields: primary language, secondary language and original language.
  • Languages are drawn from Amazon, your library record or the whole LibraryThing collection (see below).
  • The catalog shows "language" and "original language" fields. Go to "change fields" to see them.
  • Language can be edited within your catalog, much as tags are.
  • Power edit has a versatile "set language" feature.
  • Each language has its own dedicated page (eg., French). At present, these only show the most popular works originally in that language.
  • Your "Fun statistics" page crunches the numbers on the languages in your collection.
  • I've adopted the full MARC specification for languages, so you can catalog your Arawak and Elamite holdings. In most circumstances, however, you're given a shorter list, with the option to see the full one
Not right? Don't blame LibraryThing!

LibraryThing does its best, but it won't always get the language right without some help. The reason has to do with the source of the data:

If you find your books through libraries, the languages are picked up from their catalog's MARC record. That's the theory. In fact, as we've discussed on the Google Group, library records are surprisingly sloppy with languages. (If you doubt that, click the "card" icon and look at the MARC 008 and 041 fields.) Polyglot libraries will cleanup. Of course, if you don't care about the language field, you don't need to look at it.

If you find a book on Amazon, LibraryThing guesses based upon which Amazon you used. That's the best I can do, unfortunately. Amazon doesn't tell me the language.

Because of the way "works" operate, if you leave the "original language" blank, LibraryThing will make a guess based upon the other copies of the work in the system. As elsewhere, these guesses appear in green. Green guesses are updated daily.

Let's talk!

This is one of the more extensive changes I've made, with tentacles all over the functionality and code. Sometimes the "why" of a feature is complex, but I had to do it that way. Other times, I may have taken the wrong route. I'm guessing people come up with some great suggestions for changes or new, derived features. (And, as someone will surely point out, the system still has problems searching and sorting diacriticals. I'm working on it.)

I've set up two discussion threads in the Google Group, one for philosophy/functionality discussion, and one for bugs. I'm looking forward to what people have to say. Gratias tibi ago, Thingamabrarii.

38 Comments:

Anonymous Anonymous said...

Wow, this is definitely a fantastic new feature. Hours of fun for the polyglots in the group.

Where exactly do I go to see the full list of language options, though? I'd like to change the specs on some of my books (they're in Irish, a.k.a. Gaeilge or Irish Gaelic) but of course that option doesn't appear in the simple drop-downs on the Edit Info page.

4/24/2006 8:13 PM  
Anonymous Anonymous said...

Never mind, found it {blush}.

Now I just have to decide between the two Irishes! Hey, if you really want to drill down to the finest level of detail, can you distinguish between Ulster Irish, Munster Irish and so forth? {wink}

chamekke,
who finds herself suddenly annoyed that she gave away her one and only book in Dzongkha

4/24/2006 8:16 PM  
Blogger Tim said...

No, you have to use the MARC languages. (In reality, I'm only storing a three-letter code, and then looking up the "name" of the language.) The MARC list is a funny one, somewhat unbalanced, if you ask me. I wonder how it came about.

4/24/2006 8:18 PM  
Blogger Johnnie Burgess said...

What about books with more than one orginal language, like the Christian Bible?

4/24/2006 8:30 PM  
Anonymous Anonymous said...

Tim you are my hero!

A few minor bugs:
*Power doesn't remember to stay on the "Set language" tab.
*Power edit doesn't have a switch between major languages and all languages.
*The ajax language editor should activate the primary language by default so I don't have to click on it. Also I should be able to just hit enter rather than use the mouse on the "Submit" button.

*I would prefer in Ajax mode to be able to edit primary and original languages in the one box.

*How about hiding the secondary language dropdown menu until a primary language is set?

Thanks again, hero!

4/24/2006 9:12 PM  
Anonymous Anonymous said...

Hey Tim,

The language feature is a very nice idea, because as a fan of classic literature some of my books are translations into English from other languages. Two of the books I have were originally written in Russian and translated to English so I've set the original language field for those to Russian and primary to English. That sounds about correct, right?

Jared

4/24/2006 9:16 PM  
Anonymous Anonymous said...

Gratias tibi ago, Thingamabrarii.

Ooooooh. :-)

RJO, Thingamabrarius

4/24/2006 11:21 PM  
Blogger Tim said...

Doesn't Thingamabrarii sound like some German tribe in Caesar's Gallic Wars?

4/24/2006 11:26 PM  
Anonymous Anonymous said...

Yup, I think the Thingamabrarii were an offshoot of the Iceni or the Coriosolites, or one of those tribes.

;-)

4/24/2006 11:32 PM  
Blogger flexnib said...

WOOT! Now to redo all my Chinese language titles!

User flexnib.

4/25/2006 12:58 AM  
Anonymous Anonymous said...

Why is Dutch listed twice? There is only one Dutch, shared by The Netherlands and Belgium, as defined by the Nederlandse Taalunie (http://taalunieversum.org/taalunie/).

I know nl_NL and nl_BE exist, but in the context of books, that distinction is useless (only use I see is serving country-specific websites). Anyway, from the list of languages on LT, it isn't clear what the difference between the two Dutches is.

4/25/2006 5:49 AM  
Anonymous Anonymous said...

I think I mentioned this on the Google Group but it's worth repeating that the restriction to only two primary languages is a real problem for books which are published containing more than two simultaneous translations. For example, I have a couple of books where the entire text is included in three different languages. How should I enter these?

4/25/2006 7:16 AM  
Anonymous Anonymous said...

Cool! I'm not much of a polyglot (or rather, I haven't catalogued my Greek and Latin books yet) but I'm glad I'd already tagged the books I own that are translations as such, combined with the poweredit that was very quick and painless.

*heads off to relabel dictionaries*

4/25/2006 7:46 AM  
Anonymous Anonymous said...

> where the entire text is included in three different languages. How should I enter these?

You could choose Multiple Languages (mul).

4/25/2006 8:23 AM  
Blogger Christophilus said...

As a language nerd, I find this all sorts of fascinating. Thanks for adding it! Although I arguably ought to be working on my thesis and all. May I also throw in my request for multiple original language capabilities? The Heidelberg Catechism, for instance, is based on Latin and German texts; the Christian Bible on Koine Greek, Aramaic, Hebrew (and probably a few others I'm forgetting); Wiesel's Night on Yiddish and French; and those are just examples from my limited collection. This is simply far too fun--it should have a Surgeon General's warning on it or something to warn potential users of its addictiveness.

4/25/2006 9:46 AM  
Anonymous Anonymous said...

This is fantastic!!
I am very, very impressed.
Thank you so much!
j.

4/25/2006 10:33 AM  
Blogger Tim said...

Language: Oh, that hurts. You're too right. (Dies on stage.)

4/25/2006 11:41 AM  
Anonymous Anonymous said...

>You could choose Multiple Languages (mul).

That's fine now but won't help me if/when Tim builds languages into the search screen, or when I select the catalogue view by language, as it will mean that my book in French, German and English (entire book in all three, with no information given as to which is the original author's text and which two are the translations) won't appear if I search for any of it's constituent tongues.

4/25/2006 4:15 PM  
Anonymous Anonymous said...

Is, say, an originally French book in French supposed to have "French" both as "Primary language" and "Original language"? By default, all my non-translated non-English books have the "original language"-field blank. That, however, makes them not included on the different statistics pages.

4/25/2006 4:22 PM  
Anonymous Anonymous said...

Thanks! This is great, though I've got some work to do.

I'm with anonymous on the ability to at least enter a tertiary language. I have several books that were published with the text in three languages (e.g., Hebrew, German, English). I can think of several multi-lingual cultures and academic disciplines that might need three languages, and have a number of books that get up to three, but am having trouble coming up with a real life situation requiring four or more languages to be tagged on one book.

4/25/2006 4:41 PM  
Anonymous Anonymous said...

Ooooff, awesome! But I see a lot of work coming my way. As I buy almost all my English books on amazon.de, they now have German as original language which of course is wrong... Thank God I love LibraryThing so much that I probably won't mind spending hours correcting that.
Thalia

4/25/2006 5:06 PM  
Anonymous Anonymous said...

I've been playing around - this really is great!

A few thoughts - there are a number of authors who write in multiple languages (like Tagore -- if you look at his "Hungry Stones", he wrote some in Bengali, some in English, and then translated the Bengali stories into English for the English volume). This happens a lot among Yiddish and Hebrew authors, but seems to happen in many multi-lingual cultures. So there are some needs for books with one language the book is in, but with multiple original languages, the reverse of how it is now set up.

The worst here are anthologies, like anthologies of Modern European Poetry, and this is probably where "Multiple" is best used, though that would mean that if I went looking for all the translations I have of German poetry, I would miss the anthologized ones. Maybe that can get dealt with in the search logic somehow.

I love the depth of coverage of the languages - can't wait to see what Zeitgeist features you come up with to show this off.

4/25/2006 5:20 PM  
Anonymous Anonymous said...

That might also be something the suggestion lists could work with..

4/26/2006 3:00 AM  
Anonymous Anonymous said...

The Old Testament, or Torah, was written mainly in Hebrew with some pieces originally in Aramaic.

The earliest known versions of the New Testament are in Greek, but some believe there were earlier versions in Aramaic or Syriac.

So I'd probably indicate the original language for the Bible as Hebrew/Greek/Aramaic.

4/26/2006 9:29 AM  
Blogger Alanna Smythee said...

So, when are we going to see "Klingon" on the list of languages?

(To be honest, I don't even have any books in Klingon, I just remember Tim mentioning it and now I'm waiting to see if he'll do it.)

4/26/2006 10:46 AM  
Anonymous Anonymous said...

Does anyone know what form of the Dutch language was spoken in New Amsterdam circa 1540-1570?

I have records that are described as translated from the "Old Dutch", but it's likely we're talking about a Middle Dutch language or one or more of the regional variations on Dutch.

4/26/2006 11:41 AM  
Anonymous Anonymous said...

Correction - that should be circa 1640 - 1670. Probably not much dutch a century earlier.

4/26/2006 1:17 PM  
Anonymous Anonymous said...

lepascal:

You can certainly use tags to mimic the functionality of your proposed categories within your own catalog. Are you talking about something that would be applied universally, so that you can (for instance) look for poetry in someone else's catalog, regardless of whether they have it tagged as such, and where there would be a universal classification that a particular work was fiction or non, poetry or prose, etc?

I can see the use of that, though the contrarian in me keeps coming up with edge cases where the classification would be difficult or contentious.

4/26/2006 2:21 PM  
Anonymous Anonymous said...

Tags *are* categories. What I think should be done to emphasize this is to improve the use of tags slightly by allowing users to group by a certain tag and such.

It would be nice if I could group fiction, non-fiction and comics separately, for instance.

4/27/2006 3:53 AM  
Anonymous Anonymous said...

makis:

There are at least two ways you could do this.

(1) Use the 'tags' tab to show only works tagged 'fiction', or 'nonfiction', or 'comics'.

(2) Make sure the tag you view as primary is always the first in the list, then sort by tag.

4/27/2006 2:46 PM  
Anonymous Anonymous said...

a_musing,

That would be plain old Modern Dutch. Dutch had already been largely codified by the end of the sixteenth century, including prescriptive grammars and a definitive Bible. Middle Dutch is usually considered to end around 1500 or slightly later.

4/28/2006 10:56 AM  
Anonymous Anonymous said...

Thanks - Dutch it is!

The translator of these particular books seemed to find the Dutch used very archaic.

4/28/2006 12:35 PM  
Anonymous Anonymous said...

Great new feature

4/30/2006 7:05 AM  
Blogger John said...

Okay, this is completely out of context here, but I just had to comment somewhere fast and I guess someone might eventually read this...

Can we do something about spoilers in the comments of books?

I know, I know it would require teams of people checking the comments constantly and editing apropriatly to effectively erradicate this problem, but how about just putting a "no spoilers" warning when you comment or something?

I just had the ending of a book I had been looking forward to experiencing ruined for me by reading some of the social comments, which is where I am comming from here.

I hate, hate, HATE, knowing the end of a story before I experience it.

Okay, I'm finished now, thanks for letting me vent.

5/02/2006 2:55 AM  
Anonymous Anonymous said...

RE: spoilers

I would agree if this were primarily a site for reviewing books. But this is a site to catalogue private libraries that also allows the user to include comments and reviews.

I use those fields to include information that I wish to be able to retrieve. I want to record my own summary of a book's plot, my own feelings and opinions about it. I want to be able to go back six months or a year or two years later and, with the assistance of those comments and reviews, recall the experience I had with that book.

To accomplish that, I may include things that I would omit were I writing for other readers. But I'm not. I'm writing for me.

5/02/2006 3:22 PM  
Blogger Tim said...

This isn't really the right place to talk about it, but I wonder if there shouldn't be a special "spoilers" checkbox next to reviews. Then spoiler reviews could be in pink, or something.

5/11/2006 1:17 AM  
Anonymous Anonymous said...

Would it be possible to perhaps add a third option instead of the default and 'all languages'?
A middle-thing which included more common languages than the ones in the default list, but not all of the obscure ones?
I'm mainly asking because Norwegian (Nynorsk) is inexplicably in the default list while Norwegian (Bokmål) isn't - browsing through a big list like that takes precious time ;)

Also, would it be possible not to revert a choice when selecting 'all languages'? For instance, if I select Norwegian (Nynorsk) and then press 'all languages', if would be great if Norwegian (Nynorsk) still was chosen so I'd only need to move one spot up the list.

5/13/2006 5:38 PM  
Blogger Say it in Dutch said...

New Year Intentions 2007
1 learn dutch 2 learn really dutch
3 get a dutch language course now :-)
http://www.sayitindutch.com

11/25/2006 9:59 AM  

Post a Comment

<< Home