Friday, May 04, 2007

Affinity percentiles and Altay

Altay (middle), John (sweatshirt), Tim (right), Abby (encased in her spherical "soul cage")

We're introducing an important new feature, but only just. The feature is called "affinity percentiles." Basically, we show numbers next to other user's names. These represent how "similar" your libraries is to theirs.

We've started it off on just one area of the site, the message pages in Talk (example). We plan to roll it out across the site, but not until we get a lot of feedback. I have a feeling some members will love it, but some won't. This isn't something we want to do lightly.

The number needs some explaining. (It may be too subtle, and we should fall back to a more straightforward "books shared.") Basically, the higher the better. The person who shares the most books with you will have a 99%; the person who shares the least gets a 1%.

The percentage isn't the number shared—65% does not mean a user shares 65% of their books; it means that the user shares more books than 65% of users. Two other factors come into play:
  • a member has to share five books to get an affinity percentile
  • "sharing" is weighed by book obscurity and library size. A user with 100 books, who shares 20 obscure books with you ranks much higher than a user with 10,000 books who shares some very popular novels.
Other features:
  • If you hover over the percentile, you'll get the shared books. We've thought of having it actually show the books.
  • The percentile box is colored in line with the number—the hotter the higher.
Some questions:
  • Are the percentiles too hard to understand; would shared numbers be better
  • Is the weighting confusing?
  • What should happen when you hover over it? When you click on it?
  • Where should it go? Where shouldn't it go?
How? I've wanted to do something like this for months. It's a surprisingly difficult technical problem. You can't calculate it on the fly every time, that would be insane. But caching the data gets big quick. Imagine a "Battleship" grid of users—190,000 by 190,000. If you stored a single byte for each connection--the number of shared books--it would amount to at least 16 terabytes of data (190,000 squared/2). The solution I came up with involves efficient short-term caching, and ignoring members with fewer than five shared books. We've actually been running it on the Talk pages since last night, waiting to make it visible until we knew it wouldn't melt our servers. (So far no melt!)

You'll notice the numbers aren't there when you first hit the page. They come in a second or two later. This is "Ajax" at work, and was done to prevent the new feature from slowing Talk down.

The real benefits will come when the feature is distributed across the site. I'm particularly interested in seeing affinity percentages on reviews, and sorting by them. Ultimately, I don't care what 300 people think about the Da Vinci Code. I want to know what Tim-ish people think of it.

Why?
The crux of the idea is to highlight what makes LibraryThing social system work, so-called "social cataloging." Vanilla social networking is structured around "friends." That's a powerful idea, but it has limits. It can be too "binary"; and the dynamics of "friending" a stranger miss many of us. At its best, social cataloging gets at something more nuanced. If I share 50 books about ancient history with you, there's a degree, a nuance and a semantics to the connection that opens up a world of possibilities. Some are social and some aren't. I might want to chat with you about the books we've read, or I might not. Either way, I benefit. The rest of your library is probably interesting to me. And your opinions have a claim on my attention no anonymous guy on Amazon gets.

This post also introduces Altay Guvench (username: Altay), who did the Javascript work behind affinity percentiles. This was actually a toss-off, but Altay was the force behind the much more amazing Javascript in LibraryThing for Libraries. That stuff is a work of art—Javascript inserting Javascript. It might actually be self aware! Altay will be working on the site generally, with a tilt toward things that JavaScript can improve, like the widgets.

Altay in a nutshell: Portland native. Harvard undergrad. Bassist for the alt-country band Great Unknowns (toured with the Indigo Girls! Reviewed ecstatically. Listen to a free song!). Co-founder of Y-Combinator-funded startup AudioBeta. One of only three members on LibraryThing with Optical holography : principles, techniques, and applications. Scheme hacker. Nerd, but a nerd who rocks out.

Labels: , , ,

38 Comments:

Anonymous Anonymous said...

This is neat! I think it would be more useful when a couple other features are introduced -- using ratings in calculating similarity of libraries and excluding wishlist/read books -- but it's fun to see already.

I think the percentile is more useful than just "shared books," and the weighting is a good idea. I was surprised at how many of the percentages are high (I've only seen two below 89% so far), but maybe most of the groups I'm in have people with libraries like mine, which would make sense.

5/04/2007 2:02 PM  
Anonymous Anonymous said...

I like the idea. I HATE the colors. It is very distracting and for every post I end up reading the percentage. It's enough of a dislike that if it stayed I'd actually read less of Talk. I want to be able to read the threads FAST and I can't do it if I'm distracted by the colored box.

5/04/2007 2:05 PM  
Blogger Iris said...

I love the idea, but what about de-number-ifying it? The percentile makes a lot of sense until you throw in the weighting based on book obscurity (and if you know what percentiles are). But all I really need to know is "so totally not similar," "not similar," "similar," "really amazingly similar."

So maybe going with icons or just colors or something rather than numbers which need a lot of explaining and don't actually end up meaning what they seem to mean.

5/04/2007 2:09 PM  
Anonymous Anonymous said...

I can't see this (Windows Professional XP/IE6) and, judging from your image on the blog, I'm glad I can't. I don't care how similar someone's library is to mine, and I really dislike how cluttered it makes the posts look.

Lose it, or make it an option that you have to turn on.


Drat. It just showed up. It's ugly.

5/04/2007 2:19 PM  
Anonymous Anonymous said...

This is interesting when I first saw what was going on I thought "hey, great, maybe this will let me ignore the MORONS before I bother reading their posts" ... unfortunately, I have discovered that some of the folks with whom I have frequently "butted heads" show Affinity Percentiles in the the high 90's!

I guess we DO "become what we resist"!

5/04/2007 2:25 PM  
Blogger Annie said...

I love the idea and I love the colors! I think the colors make it pop and stand out and really add something to the design.

5/04/2007 2:33 PM  
Anonymous Anonymous said...

>> Are the percentiles too hard to understand?<< Yes.
>> would shared numbers be better<< Yes.
>> Is the weighting confusing?<< Yes. Or to qualify these 3 answeers, the explanations are not obvious from seeing a % number and then a book count upon maouis-over.
The numbers in & of themsleves cant really explain or predict an affinity between persons. The info is already available when you look up someone's profile, which should be the minimum effort required if I want to ponder or investigate an affinity beyond reading a persons remarks in groups (or talk). Can we implement user profiles that let us turn this "feature" off?

5/04/2007 2:41 PM  
Anonymous Anonymous said...

I think I like the feature but I agree with morphidae, the pink is a bit distracting.

5/04/2007 2:44 PM  
Anonymous Anonymous said...

Just to reiterate what I said in Site talk : colorful %'s - The colors: OUCH. The idea: me likey

5/04/2007 2:45 PM  
Anonymous Anonymous said...

This is amusing and kinda neat, but I have a couple of problems.

First, the current multicolored disign is distracting. I think a simple boldface (75%) would look much tidier.

Second, I have absoulutely no idea how this is put together, but it's slowing down the pages considerably. Could this be fixed?

5/04/2007 3:07 PM  
Anonymous Anonymous said...

honestly, i think it comes off better than you give yourselves credit for. i was able to figure out how to mouse-over and get books in common very quickly, and after a quick glance at a couple of user's pages, could even tell roughly how you were coming up with the number.

the way you weight stuff may be a bit complicated (i haven't seen the math), but in general people are able to understand "it's weighted based on book obscurity" and having such a weighting probably makes this a more useful tool.

great new addition.

5/04/2007 3:08 PM  
Blogger jmnlman said...

I like it and once it's explained it make sense.

5/04/2007 3:30 PM  
Anonymous Anonymous said...

I appreciate this contribution to the social network very much! However, I think you need to explain (unless I missed it) what does it mean that a pair of users may see different percentiles, it's very counterintutitve when one talks of a measure of "affinity". (I.e. if A is likely to have a high affinity for B's library, how is it possible for the inverse not to be true).

5/04/2007 3:52 PM  
Anonymous Anonymous said...

Agree with iris would work best if a description or colour code My Last.fm music site does something similar in that the music played over time builds up a profile of music that is then linked with others. Their profile appears on your dashboard and you can scan your mouse over to see the descriptor of how near they are to your music taaste. You can also listen to a selection of their music to build up your own. This would be similar to reviews of books of library thing "friends" being brought to your attention or when they post books. You would be able to build a friends broad so you can keep in touch and drop messages to each other... did I mention that I think this is the beginning of exciting developments of social community!

5/04/2007 3:54 PM  
Blogger Weldon said...

I like the idea a lot. I wonder if this would be better done as a "geiger counter" or "tachometer" visual rather than a percentage number. I really only need 5 levels of affinity, not 99.

5/04/2007 4:03 PM  
Anonymous Anonymous said...

I'll second or third what's already said: terrific idea, but I'm not crazy about the colors. Maybe something a little subtler, like one base color that fits into the overall design of the site, but varies in shading by percentile?

Percentiles are a good way of quantifying this and are not that hard to understand. (A numberphobe need only understand that higher is better, really.)

I'm curious about the weighting scheme. Is it super-secret? Is it just the reciprocal of the number of users with the book?

5/04/2007 4:18 PM  
Blogger Tim said...

The number of distinct works shared between the users,

divided by

(

If user has fewer than 400 books, 20; If the user has more than 400 books, the square root of what they have

times

the square root of the average of the number of copies of works plus 200;

)

Didn't want to know, did you? ;)

5/04/2007 4:26 PM  
Anonymous Anonymous said...

(A numberphobe need only understand that higher is better, really.)

But that's not all, people need to understand why any given pair doesn't necessarily see the SAME percentile when they look at each other.

I know the explanation, but surely something needs to be posted on LT.

5/04/2007 4:46 PM  
Blogger Susan said...

I wish I knew more about how you do the weighting--do you have a description of that somewhere? It seems like a good idea, but I am just not getting this weighting thing. Like, on my main page, the weighted list of users that I share with all seem to have the same, really popular books that they share with me, although the raw list shows several users with a lot more books in common (and books that I think are more representative of my library). So, I don't get it.

*Going back to finish reading comments...* I see that you include the math on a comment, but I still don't get it. Maybe when I add more books this will change.

Anyway, overall, I think it is a neat idea. And, maybe I will be able to figure out the whole weighting thing more if I can investigate these percentages after people's names. If I can do that, then I really like the percentages with the actual number of shared books in the hover.

5/04/2007 4:54 PM  
Anonymous Anonymous said...

Thanks, Tim, I did want to know! (But then, I'm a statistician....)

Good point, Lola - the idea of asymmetric similarity may be confusing to some. I wonder if it would be worthwhile for people to be able to see their reciprocal percentile with others. I think I'd be interested....

5/04/2007 5:11 PM  
Anonymous Anonymous said...

I'm with Lola Walser. It's all very well seeing your affinity with someone else's library, but I would like to see their affinity with your library alongside it. I don't fancy opening up a dialogue with "Hey, we're 99% similar", only to find that it's only 50% from their point of view. (Made up numbers, of course - I haven't tested this yet.)

I have to side with those who think the colours are a bit too garish, as well.

5/04/2007 5:13 PM  
Blogger kageeh said...

Maybe my screen colors are different from Morphi's but the blocks are in very cool, pastel shades on my pc. And I like them a lot because I like a little color, especially when it's this subtle.

Other than that, I posted a snarky comment in the group. Forgive me.

5/04/2007 5:21 PM  
Blogger Tim said...

Didn't you read the Terms of Service? Only LibraryThing employees are allowed to be snarky!

5/04/2007 5:27 PM  
Blogger SilentInAWay said...

This comment has been removed by the author.

5/04/2007 5:43 PM  
Blogger SilentInAWay said...

This comment has been removed by the author.

5/04/2007 5:55 PM  
Blogger Tim said...

I'll show you mine, if...

It's the same on profile. The shared number is the same, but the top person on my list doesn't necessarily have me tops on theirs.

5/04/2007 6:26 PM  
Anonymous Anonymous said...

An okay idea, but it slows the loading of the talk pages sooooo much, especially on threads with a lot of messages! As I say, an okay idea, but it’s so frustrating to wait for the talk pages to load before I can scroll down that in the end it seems pointless.

5/04/2007 6:40 PM  
Anonymous Anonymous said...

I do lots of lt-ing while at work, where the bandwidth is miniscule. This is making it TOO SLOW. This is one of the few sites left without ads and flashing animation crap. Can you just please make it optional?

5/05/2007 12:09 AM  
Blogger Altay said...

Hey everyone, thanks a ton for all the feedback.

Internet Explorer users, you should find that the pages load much faster now. If you're still experiencing problems, please let me know.

Tim and I mentioned this over on the message boards, but in case you're not following the conversation there, some of you will be very happy to know that we'll soon be adding the option of customizing -- or disabling -- the Affinity percentiles.

Once again, thanks for all the feedback!

5/05/2007 2:11 AM  
Blogger WorldMaker said...

Neat! My suggestions for it fall in line with several of the ones above, but I felt it might make a good summary: a) for such a social feature numbers are too impersonal and too "exact"-feeling, b) 100-percentiles is a lot of very narrow differentiations.

My suggestion is to switch to simple dots or colored icons (smiley faces or books with wings or unicorns, or some other random things). The percentile score could be interesting information for a pop-up, but I think the main focus should be an easy to notice quick-to-read icon of only a few different types (ie, just Green, Yellow, Orange, and Red or something).

Also, I think the color scheme is backward. I think the colors should become cooler as the percentile increases just because cool colors are more often associated with "good" than warm colors. At a glance, with just stop-light-like icons a person is more often to expect the green to be the "good match" and the red to be the "poor match".

5/05/2007 3:32 AM  
Blogger Mark Barnes said...

I like the idea, but I think it needs some tweaking.

(1) I think you need to change the base scale. Perhaps logarithmic? Like lots of people here most of the percentiles were in the high 90's.

(2) Regardless of that, the numbers are pretty meaningless. I'd suggest a mini-bar chart, and the colour retained. A 30 pixel gif would be unobtrusive but get the point across.

5/05/2007 4:25 AM  
Anonymous Anonymous said...

I like it.

I like the number, and I don't think that the percentile is too hard to understand or at least get used to.

However I think I agree with morphidae that the colors should go. I could live with them, but they are distracting. I think annaclaire is right just a bold number beside the username would be better.

Caleb (LT: QuesterofTruth)

5/05/2007 3:45 PM  
Anonymous Anonymous said...

I think it's neat, but given the confusing nature of the "percentiles", might it be better just be expressed as a number (call it a similarity coefficient if you want a geeky name for it!) without the misleading percent sign?

5/05/2007 4:18 PM  
Blogger Katya said...

Wait, I thought LibraryThing wasn't a dating site! (In all seriousness, I love the idea, and I would totally marry someone who shared my love of Vonnegut, Russian grammar, and the Muppets.)

5/05/2007 5:42 PM  
Blogger mujahid7ia said...

Are the percentiles too hard to understand; would shared numbers be better
Is the weighting confusing?

No and No; having raw books shared loses the benefits of taking into account obscurity, etc.

What should happen when you hover over it? When you click on it?
Where should it go? Where shouldn't it go?

idk :)

5/05/2007 5:46 PM  
Anonymous Anonymous said...

1. Drop the colors - we can all read numbers just fine, and it cuts down on the clutter.

2. I'm not sure it's much benefit on the groups pages, because it almost encourages one to ignore comments from people 'not like us', and it clutters a page that is valuable for getting us together to chat. But...

3. It would be a great feature on the reviews page, where, as you said in the blog, when it comes to recommending a book, I care more about the opinions of people with whom I probably share interests already.

4. The percentile seems to me to strike a good balance between being too math oriented, and being too fuzzy. Most folks are used to ranking things on a scale of 1 to 10, so 1 to 99 should be easy to comprehend. And the more complicated math behind the number can be ignored, unless one really wants to figure out why one LTer is a 94 and another is a 95.

O.

5/06/2007 11:18 AM  
Anonymous Anonymous said...

funny, people are saying they don't like the colors and that they want just text percentages...since it 'clutters up the page'.

uh...the colors on a page help my mind to UN-clutter the page. All text crap is disgusting. I like color blocks to help my eyes quickly jump around the page and find what I want. ALL the text in messages is in WHITE. How hard can that be to find???

I REALLY, REALLY, REALLY like the colored boxes and the percents are plenty easy to understand. I love the hover over to tell me how many books we share. I hope that this feature will not be omitted...and at least stay an option, Tim.

And I'm not sure why these colors are so garish to everyone...I mean, it's a super pale color on top of another super pale color and it blends together really well. What the heck? How can someone be distracted by a box? I mean...just keep reading...I don't understand this at all. Do people have no focus anymore...every little thing can distract them?

I know I'm getting a little snippy, but really, I'm just flabbergasted that this is such a huge issue. It's a teeny, tiny, box. microscopic.

eh. I'm done ranting.

5/08/2007 9:18 PM  
Blogger Altay said...

Hey folks, in case you're not following the discussion in the message boards...

Just added some more display options. Now you can click on the affinity number, and you'll be presented with a choice of a few display styles.

Your choices are:
- Colored box+percent (the way you've been seeing it so far)
- Percent only
- Colored box only
- Scaled dot (higher affinity = bigger dot)
- Off

5/09/2007 7:32 PM  

Post a Comment

<< Home