Work disambiguation and the "Ship of Theseus"
This blog post is long, and involves both showy mythological allusions and inside-baseball discussions of database structures. In brief, you'll be seeing some new features, but you may also catch some glitches as I bring them live. Thank you for your support.
In philosophy, the "Ship of Theseus" is a "replacement paradox." The story is that the Greek hero Theseus (you know, minotaurs and balls of string) rebuilt his ship during the voyage back from Crete—perhaps even while it was moving—such that the ship arrived at Athens with no piece of wood that had left from Crete.* The question is: Was it the same ship?
Anyway, LibraryThing is a true "Ship of Theseus." I'm rebuilding it as it moves. This week I'll be putting in a new keel—a whole new structure for thinking about books and works.
The former system was essentially composed of discrete books. If two books had very similar authors and titles (eg., two editions of Romeo and Juliet) , the system guessed that they were the same "work." These guesses were pretty good—particularly considering they had to be made on the fly—but not good enough. And there was no way to change them. Notably, the whole system operated without a separate "works" database. This was clever and economical, but also limited.
The new system introduces a robust concept of "work." On the database side this means a special "works" database, where each work has a title (the most common title of books belonging to the work). It is the way whereby most LibraryThing books can acquire LCCNs, Deweys and other cataloging information. It will allow users to discuss books—for example, on a forum—without worrying that they were only talking to people who had the same edition they did. Techies will like that it opens the door to an external API, relying on Library of Congress data, not Amazon data, which is forbidden. And, most importantly, it will allow ordinary people to participate in the sacred act of cataloging, combining and splitting books from works as they see fit. This has never before been done before. It's Wikipedia for book cataloging.
Anyway, all this is coming this week. The trick is, the system is so complex and involves so much "calculation" that I can't bring the server down, make the changes and bring it up again without unacceptable downtime. Testing it on my own Mac takes forever and won't give it the stress test it needs (LibraryThing can average 3,000 "queries" per minute!) So, I'm going to be rebuilding the ship while it moves.
In fact, the new system is already mostly in place, but invisible. It's going to become more and more visible as the week progresses. Once everything is changed and I'm satisfied it works, I will add the last element, exposing work disambiguation to the masses. Then I'll take down the old system.
So bear with me as I make these changes. The switch-over is highly planned; I even have stuff on paper--I'm a real programmer now! But the presence of two different systems will lead to inconsistencies in presentation and other hiccups. If you notice that your official book counts disagree by one, let it slide. If something breaks, wait ten seconds and try again. If book recommendations go briefly insane—well, serendipity is a good thing!
Advice corner. I still haven't quite figured out the user-interface on work disambiguation. I think it will mostly take place on the author pages. Users will click checkboxes by books and then click "combine books" to combine them. I'm not certain if "work splitting" will also happen on author pages. Certainly work pages will let you to see all the editions of a work, allowing you to remove one or more editions as not belonging to the work at hand. Your suggestions would be appreciated.
Lastly, I want to favor library titles for books. Amazon too frequently puts edition and marketing info into their titles. (This isn't their fault; they're not running a cataloging ap.) And using library data will allow LibraryThing to offer an external API. The only trick is, libraries don't capitalize the way most people think is "right." It's "Lord of the rings" not "Lord of the Rings." I think people will go ape if work pages, recommendations and other such start using library-format titles. On the other end, it's hard to write a perfect capitalization algorithm, and library purists may resent the use of the vulgar form. What to do?
* He also stranded his wife on a desert island, but the only philosophical issues there are ethical. The ancient story is actually a little different. According to Plutarch (Theseus 22-23), the Athenians of later days exhibited the ship that the ancient hero Theseus had sailed back from his adventures in Crete. Over time, the Athenians had replaced its planking bit-by-bit, until no part of the ship was original. Personally, I think the modern paradox should be changed again. Theseus' voyage was pretty much a straight shot, and, in the story, he gives no time to even changing his sails—although doing so would have averted his father's death—let alone rebuilding the boat from the inside out. The whole thing would make a lot more sense as Odysseus' ship, or Jason's. The latter has the advantage of allowing Medea to fix the ship through magical means, even while it moved. Of course, Jason ditched Medea too. What's with it with these guys?
In philosophy, the "Ship of Theseus" is a "replacement paradox." The story is that the Greek hero Theseus (you know, minotaurs and balls of string) rebuilt his ship during the voyage back from Crete—perhaps even while it was moving—such that the ship arrived at Athens with no piece of wood that had left from Crete.* The question is: Was it the same ship?
Anyway, LibraryThing is a true "Ship of Theseus." I'm rebuilding it as it moves. This week I'll be putting in a new keel—a whole new structure for thinking about books and works.
The former system was essentially composed of discrete books. If two books had very similar authors and titles (eg., two editions of Romeo and Juliet) , the system guessed that they were the same "work." These guesses were pretty good—particularly considering they had to be made on the fly—but not good enough. And there was no way to change them. Notably, the whole system operated without a separate "works" database. This was clever and economical, but also limited.
The new system introduces a robust concept of "work." On the database side this means a special "works" database, where each work has a title (the most common title of books belonging to the work). It is the way whereby most LibraryThing books can acquire LCCNs, Deweys and other cataloging information. It will allow users to discuss books—for example, on a forum—without worrying that they were only talking to people who had the same edition they did. Techies will like that it opens the door to an external API, relying on Library of Congress data, not Amazon data, which is forbidden. And, most importantly, it will allow ordinary people to participate in the sacred act of cataloging, combining and splitting books from works as they see fit. This has never before been done before. It's Wikipedia for book cataloging.
Anyway, all this is coming this week. The trick is, the system is so complex and involves so much "calculation" that I can't bring the server down, make the changes and bring it up again without unacceptable downtime. Testing it on my own Mac takes forever and won't give it the stress test it needs (LibraryThing can average 3,000 "queries" per minute!) So, I'm going to be rebuilding the ship while it moves.
In fact, the new system is already mostly in place, but invisible. It's going to become more and more visible as the week progresses. Once everything is changed and I'm satisfied it works, I will add the last element, exposing work disambiguation to the masses. Then I'll take down the old system.
So bear with me as I make these changes. The switch-over is highly planned; I even have stuff on paper--I'm a real programmer now! But the presence of two different systems will lead to inconsistencies in presentation and other hiccups. If you notice that your official book counts disagree by one, let it slide. If something breaks, wait ten seconds and try again. If book recommendations go briefly insane—well, serendipity is a good thing!
Advice corner. I still haven't quite figured out the user-interface on work disambiguation. I think it will mostly take place on the author pages. Users will click checkboxes by books and then click "combine books" to combine them. I'm not certain if "work splitting" will also happen on author pages. Certainly work pages will let you to see all the editions of a work, allowing you to remove one or more editions as not belonging to the work at hand. Your suggestions would be appreciated.
Lastly, I want to favor library titles for books. Amazon too frequently puts edition and marketing info into their titles. (This isn't their fault; they're not running a cataloging ap.) And using library data will allow LibraryThing to offer an external API. The only trick is, libraries don't capitalize the way most people think is "right." It's "Lord of the rings" not "Lord of the Rings." I think people will go ape if work pages, recommendations and other such start using library-format titles. On the other end, it's hard to write a perfect capitalization algorithm, and library purists may resent the use of the vulgar form. What to do?
* He also stranded his wife on a desert island, but the only philosophical issues there are ethical. The ancient story is actually a little different. According to Plutarch (Theseus 22-23), the Athenians of later days exhibited the ship that the ancient hero Theseus had sailed back from his adventures in Crete. Over time, the Athenians had replaced its planking bit-by-bit, until no part of the ship was original. Personally, I think the modern paradox should be changed again. Theseus' voyage was pretty much a straight shot, and, in the story, he gives no time to even changing his sails—although doing so would have averted his father's death—let alone rebuilding the boat from the inside out. The whole thing would make a lot more sense as Odysseus' ship, or Jason's. The latter has the advantage of allowing Medea to fix the ship through magical means, even while it moved. Of course, Jason ditched Medea too. What's with it with these guys?
34 Comments:
It was just this afternoon when something on Isidore-of-seville.com caught my eye and made me think that my having taken all my cataloging information from Amazon would lead to a lack of LC info on the books. But it sounds like you're a step ahead of me.
FYI, I use Amazon as my default info search because Amazon is the easiest to search by ISBN.
Well, Odysseus DID change his whole ship. For another ship, that wasn't his, after losing all of his ships in one storm or another and ending by losing even his raft. So probably not Odysseus.
It does seem unlikely that Theseus changed the whole ship and forgot to change the sails. Maybe he left them for last and it got too late? (Did he ever actually marry Ariadne? I know in several versions he knocked her up.)
I'm with you, it has to be Jason. Nice long voyage, especially on the way home when they went all over northern Europe, came down the Danube, and then for some inscrutable reason portaged the Argo all over North Africa before they made it home. I mean, the portage alone would be much simpler if the ship were broken into its component bits first. Then you could replace them one at a time as sand got in them, they got lost, bartered for food ...
Tim: Well, the hope is that a "work structure" gives everyone LC numbers—and LC subjects, let's not forget about those! I'll have to fudge the edition part of the LCCN, but so what.
I had always suspected one could never step twice into the same LibraryThing.
RJO (now with 1600 books)
Right. This sure is a strange development blog.
Since so many wonderful features are on the way, perhaps combining some existing features would make life easier. Here's a simplification idea.
I'd suggest combining the separate "card" and "social" pages for each work into a single page. This would be the effective catalog card for a work, and would be the central image/metaphor for LT. The top of the page should be the basic catalog record and the MARC data, and below that on the same page all the social data would appear.
You could strengthen this catalog-card model by having a "next card"/"previous card" link at the top of each card page. This might go along with a "sort by" button or pull-down menu, so if I "sort by" author, the next/previous links will let me flip through by author; if I "sort by" date, then I can flip through by date; if I "sort by" call number, I can flip through in LC shelf-list fashion, etc.
RJO (fiddling around in the engine room of the ship of Theseus)
A couple of things I've been wondering about as this major development gets closer...
The lesser one first:
Sometimes for both titles and authors there is a truly correct of "canonical" spelling which is likely to differ from the most common spelling on the internet due to outdated technical reasons or plain old ignorance. It would be nice if the users were able to suggest such canonical spellings. Here is a good example: "García Márquez, Gabriel" is correct. "Marquez, Gabriel Garcia" is most common on the internet but never used on book covers etc in any European language I've seen. But it is a lot fuzzier with titles.
The bigger one:
How do you plan to handle editions which don't really exist but exist in LT due to people adding the first book that looks right and later updating some fields but not others, or updating fields without checking properly first? Or even using fields in quirky ways which are useful to them but not the rest of us?
As for title capitalisation, in English "Title Case" is the current norm - I don't know if it always was. In every other European language I know, "Sentence case" is still the norm.
And now some questions:
*Will this update include language awareness? Can we state what language an edition is in? And if not should "La peste" and "The Plague" be combined?
*Will there be multiple "levels"?
**Can we say that the famous Perl "camel book" in its various updated editions are different editions of one work?
**Can we say at one level that "Harry Potter and the Philosopher's Stone" and "Harry Potter and the Sorcerer's Stone" are the same work but at the next level say they are different and on that next level still combine the hardback and softback and adult and juvenile editions of the British version?
Tricky tricky stuff Tim. And exciting too!
Well, I'm excited. It's great fun watching this evolve and flex as new features come in. Especially exciting: the disambiguation of editions. Even with all the good questions Hippietrail raises, and the dozens still to come, you're putting together something really valuable.
For what it's worth, I would hope that a given title will have an "origin" edition, usually that will be hardcover, first edition and then a whole satellite of editions that follow. Softcover, trade paperback, large print, etc, in one branch of the tree; audio editions in another branch; and translations into other languages in additional branches.
This would mean, of course, that (for example) all of Goethe's works would have an origin edition in German, and that the English editions most of us put into our libraries are in one of the tertiary branches.
If handled this way, there would never be any question about original publication date or place, and that would be really useful.
Just thinking out loud here. Looking forward to seeing what you come up with.
RJO,
Thanks. I'll unchain you from the oar. No, seriously, that's interesting. My reactions:
1. I like the forward/backward links. It would require some adjustments. "Doing the sort over again" each time you go forward or back is wrong from the user's perspective. the user expects to get books in the order that was on the catalog page, even if she changes the title or author and that's the thing being sorted. "Saving" the catalog pages' order is easy, but I'd effectively need to save the entire library's order, in case the user paged through every one of them. That would require a rather major rethinking of how sorting and presenting works. It might be a good rethinking, but it would be one.
2. I think it's necessary to distinguish between YOUR data and other's data.
(a) Your book might not belong to the right work.
(b) There is no one Marc record for a work. Take a look at http://www.librarything.com/catalog.php?booksim=1919130213&mode=social . At the bottom, you'll see I've stuck all the titles given to this one work, and all the Marc records.
(c) I want to be flexible about what work data you use.
Take LCCNs. The the way I envision it, you can use your own LCCN numbers if you have them (or if you add them). But if you don't have one filled in, it will show you the one corresponding to your work. This work LCCN will be unchangeable by you.
I'm imagining this data—and LC subjects, Deweys, etc.—will be Roman if it's your data, and italic if it's coming by way of your work.
Lastly, I want to distinguish between the card catalog and the social data for pure space reasons. There's a lot of catalog data—and there will be MUCH more once I start parsing everything in the Marc records. There is also a lot of social data, and there will be more as I dream up fun thing. More plus more means one big page...
Just my thoughts. Disagreement positively wanted.
There's a real-life Ship of Theseus in Baltimore Harbor - the USS Constellation, which may or may not still contain some planking and/or keel from the original frigate (current theories lean toward 'not'.) There was a lot of discussion - including legal debate - a decade or so ago, the last time it was redone, about whether or not they could call it the same ship.
Umm .. and the new revisions sound really cool! I can't wait. And speaking of what makes something the same, are you going to try to make guidelines as to what constitutes the same 'work', or are you going to mostly let the users build their own consensus over time, as with tags?
Just because the idea is banging around in my head, and maybe some part of this will spark a usable idea someplace else, I imagine something like this:
-----------------
Choose the edition you wish to add to your library catalog
OR enter an edition not yet listed
OR use a generic description without edition information
origin/first edition
The age of innocence / by Edith Wharton / 1920
ISBN: none
Publication: New York; D. Appleton and Company, 1920. [6], 364, [2] p. (last p. blank) ; 20 cm.
Other authors: Lawrence Beall Smith
Cover Image
LC call no: PZ3.W555 Ag PS3545.H16
Subsequent US print editions (sorted by date, most recent first)
US audiobook editions
Overseas editions/editions in translation
---------------
There would have to be links to submit corrections, and a way to upload covers for odd editions people enter into the database, and multiple reminders not to upload an edition that's already entered...
1. Spelling will be democratic at first—the majority title wins. (Authors are dealt with by a different system, which is also democratic.) The only exception is that the system prefers all non-Amazon data before using Amazon data.
2. The "phantom editions" problem shouldn't be one. The thing that makes a book in the system is the title and author. People do remember to change those.
3. It's true that some crappy data will enter from users. That's life if a democratic society. Considering how many bona fide librarians there are on the site, I suspect any changes will be improvements. And, as stated, everything will be by popularity, so if one person changes their copy of "Harry Potter and the Sorceror's Stone" to "Harry Potter and My Butt," it will not be seen by anyone but that person. Embrace the chaos!
4. I don't want to argue the case issue. (I frankly don't care.) I just worry about it either way. And I don't want to write a title case algorithm and then have people tell me that "The iPod Book" is showing up as "The Ipod Book." If anyone knows the absolute killer PHP title-case regex, send it my way... (Or Perl, for that matter.)
5. As for languages, it will start out unaware. (Later, I may use the Marc field that links things to their original record.) But, of course, you will be able to combine books by author, so I imagine "La peste" will swiftly get combined.
6. Yes, La Peste should be part of The Plague. The POINT of the system for LibraryThing—not for all possible purposes, but for LibraryThing—is social. Works tell you who has books like yours and what books you might like to read. For social purposes, what matters is the book, not the language it's written in. (Since LT is in English, it's presumed that all users can converse in English and read English book titles.)
7. The same goes for different editions. If the point is social, it does not matter if Joe has the Perl book in the first edition and Mary has it in the second edition. It certainly doesn't matter if one person is British and one not. Ditto different media—paper, audio, Braille, etc.
8. The system will be flat. Books will either be part of the same work or not. There will be some problems with collections. (For example, there will be no link between a one-volume set of Twain's novels and all the individual novels. This MAY come later, when I add an "inside layer," capturing short stories and articles.
9. Saralaughs vision is a good one, but it's harder than you might think. At bottom, LibraryThing draws a whole lot of information from a lot of different sources—Amazon new books, Amazon used books, Libraries big, small and screwy. I simply can't know what the underlying reality is. The really important piece of information is, as stated, social.
10. I'm still deciding how the "all editions" page will work. To LibraryThing an edition is essentially a name and author. Variants in those make up different editions. Different editions are then combined into "works." The combining must be done on a page that, basically, shows you all the different title and authors combinations. Years, publishers, ISBNs are for this purpose, not important.
That said, LibraryThing COULD parse "editions" in the traditional way if users wanted it. Tricky.
1. I like the forward/backward links. It would require some adjustments. "Doing the sort over again" each time you go forward or back is wrong from the user's perspective. the user expects to get books in the order that was on the catalog page, even if she changes the title or author and that's the thing being sorted. "Saving" the catalog pages' order is easy, but I'd effectively need to save the entire library's order, in case the user paged through every one of them.
Maybe I'm missing something or wasn't clear. I was only talking about browsing my own (or one person's) collection, and as I'm picturing it, there are two "views": the "list" view and the "card" view. If I'm in "card" view, the idea of next/prev links is to take me to the next/prev card in my own collection. I know nothing about databases, but if I have say 1600 books, isn't it comparatively low overhead to keep three or four lists of 1600 record numbers, one sorted by title, one by author, one by date, etc.? If I'm browsing my catalog by card, flipping the switch from author to date just shifts the pointer from record#542 in the author array to record#542 in the date array. (Like I say, I *really* don't know anything about databases, but in my imagination, they're quite clear... ;-)
2. I think it's necessary to distinguish between YOUR data and other's data. (a) Your book might not belong to the right work. (b) There is no one Marc record for a work.
Right! I'm only talking about my data - not even thinking of "works" and "editions" in this context: just talking about flipping through my own catalog. Maybe part of the confusion is that I don't use the social stuff much at all. Of my 1600 books, I don't share more than 54 with anyone else. Part of the motivation for suggesting that the social and card pages be combined was that when I click on a title in the "Random Books for your Library" widget, it goes to what for me is almost always an empty page.
In that context, perhaps what's wanted is a "preferences" panel for each user: links go to card vs. social page; MARC data displayed vs. hidden; cover images displayed vs. hidden, etc. It would be important to be able to specify which preferences are personal (how I want to see things) vs. public (how I want other people to see my records when displayed).
Lastly, I want to distinguish between the card catalog and the social data for pure space reasons.
Understood. You have a far better grasp of that than anyone.
(This whole experience almost makes me want to learn something about databases.)
I simply can't know what the underlying reality is.
So you're a Kantian, eh?
RJO
Shawna: Scholastic? Give me an example book, with ISBN.
I'm looking forward to the new changes.
Being a non-techie, I'm still keen to understand how the new work structure for combining titles will accomodate multiple or joint authors. Will they throw further spanners in the works?
A classic example would be the popular & commonly titled "Good Omens", jointly written by Neil Gaiman & Terry Pratchett. Their original hardcover edition had a much longer title - and to make matters more fun ~ UK editions had Pratchett's name first whilst the US ones had Gaiman's first. (Some agreement they made at the start I think).
Would the new way of combining titles put all these works together or would they still be identified separately? Or is the whole point moot?
Like I said, being a non-techie, it all sounds good but I'm trying to picture if my own catalogue would change and how the combining would affect the zeitgeist count of top authors and titles. Any thoughts?
Great news! This is the killer thing I have been waiting for.
A few random thoughts:
- I think the original title should be the name of the work rather than the most popular one. A French book should have it's original title used rather than the English translation which in this case might be used just because Librarything happens to have so many English speaking users. I think this would also remove some confusing issues and be very straightforward to understand. And on top of that, you wouldn't have to search for the most popular title when displaying the work name. This might get quite db intense.
- I hope you have taken into acccount books that are contained in many works. There are lots of books with two novels in them, for instance.
- Connecting books to works could be done in numerous ways. You could simply allocate a space in the user's window for this. It could include user selected books from his collection and then when he opens up a work, it could have some simple mechanism (drag & drop, combobox..) that allows for connecting the book to the work.
I'm particularily interested in this concept because about half of my books are translations from their original titles. So comparing my collection to someone elses makes no sense at the moment as my translated books are not connected to the original works in any way.
On the other hand, not all of my translated books are originally English. I have books by Stanislaw Lem for instance. But I bet the work's name would be each books English title in this case, which I don't think makes a lot of sense. And in a way it's not very fair to the author either.
OK, maybe the best possible solution would be to allow each user to select what they want to be displayed as the work title, original or most popular.
On combining.
At the moment the author combining isn't quite as good as it should be. I put in a book edited by Ed Ferman (and Barry Malzberg). Now there are already books under his full name Edward L. Ferman. These should be able to be combind but are never offered as a suggestion.
On book combination - my vote would be for title case - but I know how difficult capitalisation is to do over multiple languages. As for titles - well you are stuck between a rock and a hard place. I guess most people would prefer "Around The World in 80 Days" to "Le Tour Du Monde En Quatre Vingts Jours". Especially so for English translations of books that originally appeared in a language which uses a non-roman alphabet. However when it comes down to English books I would prefer "Philosopher's Stone" to "Sorceror's Stone" and "Northern Lights" to "The Golden Compass". Similarly if the first edition was in the US I would use the US title.
BTW - when searching by author for Ferman and clicking on Ed Ferman I get an error -
"You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ') AND books_sort_author NOT IN ('fermaned') AND books_sort_author <> '' AN' at line 7 - fatal error (5*)". This doesn't happen for the other Ferman entries.
Like Shawna I have the chronicles of Narnia as a single volume (and a separate copy of the Last battle).
Perhaps after listing the editons of a book there could be includes and included in lists with links to other books.
Thus the chronicles of Narnia would have
Icludes:
The Magician's Nephew
The Lion the Witch and the Wardrobe
The Horse and His Boy
etc.
and The Last Battle would have
Included in:
The Chronicles of Narnia.
Books with study notes could be separate books linked in this way.
For translated works I would like to see the translaor's name and the date of the translation in addition to the edition date.
Title capitalisation should only be made for languages that use it. It's not the norm by any stretch of the imagination.
Oh, and about title language: I think the main question is if Librarything is meant to be predominantly for English books, or for books of any language. If I was French, I would be pretty annoyed that the great French books are referred to with their English title rather than the original. And if LT is meant for any books, there really is no reason to use the English title for a French book as the work title. I admit it would be easier in some cases, and just for this, maybe a field showing the English title could also be shown.
I like the idea of disambiguation but I think the translation/international issue is going to cause some big problems. How does "Good Wives" fit in? (Original and authentic title on first publication, but only in English editions; the original US version is 'just' part two of "Little Women".) "White Boots" vs. "Skating Shoes"? The "Good Omens" deliberate primary author confusion?
I take the point of the commenter who prefers "Northern Lights" to "The Golden Compass", and would take the same approach myself, but merely going by numbers (democratic approach) will count against non-American editions and titles, by and large.
What's with it with these guys?
Men are dogs?
Serious question: how will the changes affect books that are already in the system?
Shawna -
Many Scholastic Books are editions for the school market only and until recently the ISBN didn't appear on the book ( the price still doesn't). Other 'editions' are for sale in bookstores and have all the expected info. As for as I can tell they are otherwise the exact same book. So I would just take the same same author, title, cover as being the same edition and not worry about ISBN. As for where to find: Scholastic does have a current catalog online, but this is just this months club offerings and you need a code to see the selections. Also, many of their books are 'book club' editions of other publisher's books which further complicates things!
Another note about Scholastic books: most of the ones without ISBNs (or all, in my experience) and some of the others are so badly put together with such cheap materials that they seldom withstand more than a couple readings. Disposable books: not a good idea for a whole lot of reasons.
It's one of my pet peeves. Sorry.
A clarification, to Lilithcat and others:
The "works" system will not change any personal data. Your books will always have the title you found (or gave them). Social data pages, however, will list the work title, which will be arrived at democratically.
I'm still working on bringing the work-system live. (Things always take longer than I think...) Some features, like the "connections" box on your profile, is already using the logic, although there's no way you'd know it. The book recommendations—people who own X also own Y, and tag similarity—is giving me database-join headaches...
Scholastic/BookClub editions ...
I have several shelves worth of various Book Club editons that I've had to "make due" with whatever data could be gleaned from the searches ... mainly SFBC hardcovers from back in the 70's. I suspect that the biggest difference is in cover art, but it's too bad that we can't "data mine" the various Book Clubs for this stuff.
Good luck with the works system. As others have noted, there's some thorny issues lurking in that one. You're a braver man than I to tackle it.
I have one thought on title capitalization. If you could make sure that titles are always wrapped in some kind of HTML element (CITE is the obvious choice), you could keep the (uncapitalized) library title in the database and use CSS to style the output (text-transform: capitalize). There are some subleties this method cannot provide, such as omitting capitals for a/an/the or unusual words like iPod. On the plus side, however, this would open the door to accomodating different titling conventions. Class or attribute selectors in the CSS could provide alternate styling for certain cases, or a entire alternate stylesheet might be created for different display preferences.
I wouldn't worry too much about title capitalization. Heck - I'm a librarian and a cataloguer and I haven't even consistently used traditional library capitalization because I couldn't be bothered fixing everything I imported from the Amazons. I probably was consistent with the manual entry stuff only due to force of habit.
Regarding combining of works - I have some ambivalence about this. It seemed like a good idea to me at first but how would it handle Harry Potter and the Philosopher's Stone (373 copies in LT) and Harry Potter and the Sorcerer's Stone (1440 copies in LT)? Would those be combined? The American publisher changed not only the title but also edited the content to replace English terminology with American usage (jumper to sweater, torch to flashlight, etc.). They have the one fanbase who would enjoy discussions, but I personally would be irritated if they get combined and the "Sorcerer's Stone" version of the title is the one that gets "democratically" chosen to be the "official" title because a) it isn't the title the author chose and I object to the American publisher's edits on general principles and b) it isn't the title of MY copy.
Anyhoo, LT is a continuing pleasure, and I really appreciate all the work you put into it :)
Just to reiterate, the purpose of combining is social—linking you with people and suggesting books. It does not affect what you call your book, how it appears in your catalog, etc. If you don't want to talk to Americans who read Harry Potters, don't talk to them. If you don't want to get suggestions generated in part by the libraries of Americans, well, you're out of luck.
I do think it's a little sad that American children can't be expected to "pick it up" as regards the (few) differences in US/British vocabulary. American education and child culture is more than ever sold on multicultural benefits. From working for a K-6 textbook publisher I can attest that foreign words simply pepper today's school reading. But an occasional "lorry" or "torch"? The children's head will EXPLODE!
Yes, Tim, but what you're saying is that the social data page will reflect the "democratic" title, which in the case of HP will not be the better or more authentic title, and I know quite a few Americans who feel the same way about that and have bought UK-sourced copies to compensate! With books translated from other languages into English there are different issues (and usually agreement about the English title), but for a book needlessly 'translated' from one English to another it seems a bit unfair to make the non-original title the de facto official LT title.
(Same thing goes for the ...Shoes books another commenter mentions, most of which had original non-Shoes titles on first US publication too, I gather. I'm betting recent editions outnumber the older ones.)
I can't say I've understood EVERYTHING in the blog post and in the comments, it was kind of difficult with thoughts on Greek mythology and such :-) but nevertheless it looks as my wish that LT would become more "international users-friendly", to say so, is coming true, I'm very very happy.
For the same reason, however, I agree with makis above that, for non-English books, the original titles should be preferred (as the "standard" form) to the English ones: otherwise it would seem to me not "more democratic" but uncorrect and simply unfair.
Thanks Tim for your work
Tim, you say you have join headaches with the recommendations. But if you make sure every book has a work and there is at least one book for every work it shouldn't be too problematic. As you know, left and right joins are a real performance killer and should be avoided at all cost.
My main concern with using the LC data isn't capitalization. I've noticed that a lot of LC titles have words like "The" and "A" at the beginning of the title (that is, they title information comes up as "The Little Mermaid" instead of "Little Mermaid, The"), which seems like it could be a problem.
the title information comes up as "The Little Mermaid" instead of "Little Mermaid, The"), which seems like it could be a problem.
It shouldn't be a problem at all. It's easy to allow for this in the sorting algorithm. Tim just hasn't got around to that yet what with all the more important problems to deal with.
Post a Comment
<< Home