Sunday, December 18, 2005

Combining tags (heresy!)

I've added a "combine tag" feature, allowing users to combine VERY similar tags to be merged on the global level. (No users' tags are actually changed.) As with author disambiguation, LibraryThing users make the decision. The choice isn't pushed very hard; most users won't see it, even if they benefit from it.

You can combine when you see this below the list of related tags:

As blog readers are familiar, I take a hard, idealistic line on tagging. Tags are about memory—your memory. Automated or suggested tags (other than your own) interfere with that process. If you're gonna use someone else's mental categories, use an expert's, like say, the Library of Congress'. I buy Clay Shirky's essay/talk extolling the "signal in the noise" between tags like cinema and movies.

As the saying goes, "I believe. Help my unbelief." Reworking the related tags feature got me thinking about "tag synonyms." Is there any difference between wwii and ww2? What about world war two, world war ii and world war 2? Is some trivial nuance really worth the social loss—World War II buffs thinking they're alone, worse recommendations, and so forth? After all, the top World War II tag (wwii) is used only 1,300 times, but all the tags together hit 3,100!

So, I came up with a "combine tags" feature. It works like the "combine author" feature, except that the combine page has half a page of "philosophy" on it, begging users not to combine merely similar tags. There is also a tag combination log, allowing finicky LibraryThing-arians to follow the action, and separate tags at need. Like a wiki, it's easier to correct damage than to do it. The combination log records users who combine tags, but not those who separate them. Go ahead and separate a tag; nobody will know you did it!

I've already separated some. In my book Farsi is not the same as Persian. Although Persian is a term for Farsi (perhaps more commonly applied to "old" and "middle" Persian than the modern language), Persian is also a general adjectival form of Persia (which, incidentally, has a totally different flavor than Iran). I also split to be read and unread. To be read implies intent to read. Unread does not.

Well, that was fun. Now back to the book-cover issue...

Algorithmic tangent: There are various ways of thinking of "relatedness" between tags. For the tag pages, I key it to "works" (Platonic books, as opposed to individual books). Tags are related to the extent they are applied to the same works. Using this model, one might think of synonymous tags as tags that often occur together by work but rarely by user or individual book. A little play found this to works okay, but not well enough to be definitive. So I've resorted to user control. In essence, I'm using one user-driven process to correct for occasional mistakes of another.

Has any other tagging site ever done this?

Perhaps someone can direct me to where people talk about this stuff; I certainly haven't found it. LibraryThing's tag algorithms have all been ex nihilo. This is scary. I mean, if it were up to me, sorting would probably have never gone past the "bubble sort." Hello? I studied Greek and Latin in college!

10 Comments:

Anonymous Anonymous said...

To me, an autobiography is a record of the author's entire life (up to the point of writing, obviously!), while a memoir is the record of a discrete period of that life.

12/18/2005 10:47 AM  
Blogger Tim said...

That's my take on autobiography vs. memoir too. I also killed humor / humour. I just think the differences are too fun for words, even if spelling really ought to be trivial.

12/18/2005 10:51 AM  
Anonymous Anonymous said...

What about tags that are related, but that don't show up in the "related tags" list? For example, I've noticed that people variously use "exhibition catalog", "exhibition catalogs", "exhibition catalogue" and "exhibit catalog", which are all the same thing and probably ought to be combined, but can't be, as they don't appear in each other's "related tags" lists.

12/18/2005 12:51 PM  
Anonymous Anonymous said...

yeah, i'm having trouble with that too. while various forms of "20th century," "19th century," etc have already been combined, "21st century" and its variants don't show up on each other's pages. i figure this is because there are fewer books in this category in general and thus less overlap. but help us out, these should definitely be collapsed.

-nperrin

ps - i love this feature, love love love.

12/18/2005 1:17 PM  
Anonymous Anonymous said...

oh, and i just checked out the zeitgeist page. i thought there would be some changes to the LT tag cloud, but these combinations don't seem to be affecting it. if the changes show up globally, shouldn't the global tag cloud reflect the combinations?

-nperrin

12/18/2005 1:19 PM  
Blogger Tim said...

Re: tag cloud. Hmmm. Yes, I think they should affect the LT tag cloud (but not personal ones). Does that sound right to people?

You'll note that when you tag search in catalogs it does not combine them. I think I'll keep it that way, but I could be persuaded. Cases for and against?
The only other place I want to use it is in calculating contaguinity* between books and between people.
*My word. Email for permission to use.

12/18/2005 1:38 PM  
Anonymous Anonymous said...

Curious that science fiction and sf had not been combined as of yet. They seem to be the biggest target as both are in the top 25 tags (science fiction is fourth with over 23 kilobooks and sf is eighteenth with nearly 9 kilobooks).

This makes me paranoid that people see a difference in usage between these two, but the lists of top books tagged with these two have lots of crossover.

science fiction had already been combined with scifi, sci-fi, sci fi, science-fiction, while sf had been combined with sfbooks.

I went ahead and combined them since it is so easy to separate them in case this is a bad call.

12/19/2005 3:00 PM  
Anonymous Anonymous said...

I think the problem with sf is that it's not exclusively used for "Science Fiction". It's also used for "San Francisco". It's safer to join sf and sfbooks since they share the ambiguity.

I expect it will be joined and separated regularly.

12/20/2005 9:13 AM  
Anonymous Anonymous said...

I use "memoirs" in preference to "autobiography" for the reason given by others. I always use "memoirs" and never "autobiography", as it's the more general term. This brings up an issue that drives me nuts on del.icio.us: in an ideal world, I could define "autobiography" as a child of "memoirs", so that "memoirs" was automagically implied by tagging a book "autobiography" without having to enter both. Similarly, "physics" and "biology" could be children of "science". Etc.

12/25/2005 5:46 PM  
Anonymous Anonymous said...

there are those in the sf community that insist that sf stands for speculative fiction and includes both science fiction and fantasy (another popular tag). others may use sff (science fiction and fantasy). If I get round to using such a tag, and it would apply to about 2800 books, iwill not distinguish science fiction and fantasy.

1/05/2006 7:37 AM  

Post a Comment

<< Home