I've added a "combine tag" feature, allowing users to combine VERY similar tags to be merged on the global level. (No users' tags are actually changed.) As with author disambiguation, LibraryThing users make the decision. The choice isn't pushed very hard; most users won't see it, even if they benefit from it.
You can combine when you see this below the list of related tags:
As blog readers are familiar, I take a hard, idealistic line on tagging. Tags are about memoryyour memory. Automated or suggested tags (other than your own) interfere with that process. If you're gonna use someone else's mental categories, use an expert's, like say, the Library of Congress'. I buy Clay Shirky's
essay/
talk extolling the "signal in the noise" between tags like
cinema and
movies.
As the saying goes, "I believe. Help my unbelief." Reworking the related tags feature got me thinking about "tag synonyms." Is there any difference between
wwii and
ww2? What about
world war two,
world war ii and
world war 2? Is some trivial nuance really worth the social lossWorld War II buffs thinking they're alone, worse recommendations, and so forth? After all, the top World War II tag (wwii) is used only 1,300 times, but
all the tags together hit 3,100!
So, I came up with a "combine tags" feature. It works like the "combine author" feature, except that the
combine page has half a page of "philosophy" on it, begging users not to combine merely
similar tags. There is also a
tag combination log, allowing finicky LibraryThing-arians to follow the action, and separate tags at need. Like a wiki, it's easier to correct damage than to do it. The combination log records users who combine tags, but not those who separate them. Go ahead and separate a tag; nobody will know you did it!
I've already separated some. In my book
Farsi is not the same as
Persian. Although Persian is a term for Farsi (perhaps more commonly applied to "old" and "middle" Persian than the modern language), Persian is also a general adjectival form of
Persia (which, incidentally, has a totally different flavor than
Iran). I also split
to be read and
unread. To be read implies intent to read. Unread does not.
Well, that was fun. Now back to the book-cover issue...
Algorithmic tangent: There are various ways of thinking of "relatedness" between tags. For the tag pages, I key it to "works" (Platonic books, as opposed to individual books). Tags are related to the extent they are applied to the same works. Using this model, one might think of synonymous tags as tags that often occur together by work but rarely by user or individual book. A little play found this to works okay, but not well enough to be definitive. So I've resorted to user control. In essence, I'm using one user-driven process to correct for occasional mistakes of another.
Has any other tagging site ever done this?
Perhaps someone can direct me to where people talk about this stuff; I certainly haven't found it. LibraryThing's tag algorithms have all been
ex nihilo. This is scary. I mean, if it were up to me, sorting would probably have never gone past the "bubble sort." Hello? I studied Greek and Latin in college!