Monday, June 29, 2009

More servers, less sleep

We just finished moving about a dozen of our original servers from our "colo" in Maine to one in Somerville, where another dozen were waiting. In the process, we've basically doubled our server power.

We're still waiting to get all our metrics back up, and we have a few weeks of retasking servers (a lot will be wearing different hats), but, so far, the results have been very encouraging. We are faster today, and will be getting faster tomorrow. We have a lot more memory, disk space and system redundancy too—so keep adding books.



We've collected all our pictures on a Flickr tag Great LibraryThing Server Schlepp. Here are some of the better ones.

The move was a group affair. Abby, Sonya, Mike, Dan and I did all the physical work. Our Australian systems administrator directed us by video chat—and burned through his monthly bandwidth doing it. Abby and I did a trial run with one server on Wednesday. The rest pulled an all-nighter, except Sonya who arrived like a well-rested and showered cavalry at 7am. When we were done, Mike and I went back to my parents' in Cambridge and slept like logs. Abby, who was taking care of a toddler, stayed up the whole day. Ouch—and kudos to her.

Labels: ,

Sunday, December 21, 2008

We're faster (but not resting)

Last Wednesday John brought live two new database servers, Alexander and Hannibal*.

Together, they more than doubled our database heft. Put another way, our servers, which were operating at near full capacity all day long, can finally rest a bit. They can do everything as fast as they're able, unencumbered by unsupportable amounts of work.

Performance. The effect on site performance has been positive. But problems remain. Profile pages are dramatically faster. Author, work, subject are faster and no longer slow down at peak times. Talk pages are essentially unchanged.

The catalog is faster. The page-generation averages now hover just over one second, not around two seconds. But I was hoping for more. The standard deviation of page-creation times remains high—people with huge libraries get hurt. Last night we I made a series of improvements which I hope will pay off. (The standard deviation is down, but will it stay down?)

The future. We will continue to improve. Until Wednesday the situation was desperate. When a box got behind, we had to turn off access to interior pages to all but signed-in members. That day is over, thank God**. And we can finally tease apart what was is itself slow, versus what was just slow because everything else was slowing it down. Lastly, John has long wanted to try out some low-level tweaks, but with no spare capacity, couldn't. I expect he will find ways to wring more out of what we have.

Whether he can or not, we are going to keep improving. We have laid aside the money to buy a number of other servers—up to ten, if needed. One or two will be database servers, probably removing administration and caching traffic from the live servers. A number will be memory machines—low-end boxes with tiny disk drives and obscene amounts of RAM. They'll help us use memory caching more effectively, reducing database load. The balance will be tasked in other ways—supporting LibraryThing for Libraries, serving secondary resources (covers, APIs, widgets) and providing redundancy, so we won't be skating along a cliff anymore.

Thanks to John for getting the new servers racked and running. Thanks to the members for hanging in with us as we grow, and grew and grew!


*Yes, I named them. Cliche, I know. But Alexander was my research interest in grad school, so I'm allowed! Anyway, at least they're consistent, and set a pattern we can follow (next up, Mithridates and Shapur). I'm still bothered that a previous sysadmin named our twin MyISAM databases Apollo and Athena, not Apollo and Artemis (who were twins). Then there's Plato and his bigger twin Mongo, which makes no sense, but feels right, and the one everyone hates, our backup machine, Mnemosyne.
** John adds "the upgrade has given our database servers more horsepower rather than more raw speed. While the new servers are faster, the biggest initial gain is in the amount of load we can take on without starting to slow down."

Labels: ,