Thursday, September 22, 2005

175,000 books / calling all PHP and MySQL programmers!

We've hit 175,000 books, with no slowdown in sight. The time has come for me to consider a book database of a quarter of a million, half a million, a million or more.

It's time to ask for advice. Programming PHP/MySQL for thousands of records is just different from doing it for millions. (Even if the books don't hit a million, the tag database will.) I've been reading everything I can on enterprise-scale database optimization, but I lack hands-on experience. There's a million things I can do, but I don't have a relative sense of their costs and benefits. It's also hard to "test" solutions—just building the search index took an hour, during which time LibraryThing was completely unavailable.

So, I'm not too proud to ask! Praise, free accounts and gift accounts are available.

16 Comments:

Anonymous Anonymous said...

Shoot me an e-mail and we can talk about database ideas. I'll try to help answer whatever questions you might have and maybe can offer some helpful suggestions. I don't want to post my company-related e-mail here, but would love to help in whatever ways you need.

-Paul
paulsmail (at) gmail (dot) com

9/23/2005 12:42 AM  
Anonymous Anonymous said...

Before optimizing anything, you have to make sure of 2 things :
- make sure your database schema is well designed (no redudancies)
- make sure you have indexes on the right columns (do test your queries)

These are really the basics, there is no way a large database can be usable without these two conditions.

9/23/2005 4:07 AM  
Anonymous Anonymous said...

Hmm. You're out of my area(s) of expertise. Why not use LibraryThing to search for users with books about PHP and MySQL?

9/23/2005 7:50 AM  
Anonymous Anonymous said...

I say this as a LAMP programmer: in my opinion if you want scalability you're using the wrong tools.

In your position I would want to push as much processing as possible into the database layer by picking a DB with stored procedures and triggers. This approach lets the database do what it's good at - searching and sorting large data sets - and minimises the amount of data passed between tiers.

Having said that... 3NF your database to hell and back. Place indexes on everything you SELECT on. Get a MySQL admin expert to go over your server setup. Use foreign key constraints to keep your database sane. Consider what functionality can be farmed out to cron jobs (you already do this with the author cloud, what else doesn't have to be done at run-time?). Place the database on a seperate server. Move from mysql_* functions to PEAR DB, to ease migration to a new DB should it become necessary. Think database replication (hard to do).

These are all very general, brief ideas, thrown out for other people to chew on rather than as serious suggestions - some may not even be applicable to your setup.

9/23/2005 8:25 AM  
Blogger Tim said...

Ha! That's an idea. I'm not sure if that's ethical, however. :)

9/23/2005 8:26 AM  
Anonymous Anonymous said...

I think MySQL is plenty fast for what you're doing. But I agree with other posters' suggestions: move the database to a separate server if possible; optimize your schema and add indexes to speed up queries.

- Mary a/k/a maryb

9/23/2005 12:30 PM  
Anonymous Anonymous said...

Excessive indexing doesn't help here. It speeds up queries but slows down updates and right now there are at least as many updates occurring as queries (perhaps more, depending on how many people are using autocomplete).

Without knowing your table structure, there's not much useful advice I can give. But I never let that stop me before :).

My suggestions:

1. Relook your user interface. If you reduce the number of different catalog presentations, you reduce the number of columns on WHERE you SELECT, and thus the number of columns that might need indexing. Similarly restrictions on how people can search might be helpful. What have people actually been using? It makes no sense to provide capabilities that aren't being used but whose infrastructure slows down the capabilities that are being used.

2. Do you have enough peruser tables? If enough queries are against peruser tables, there isn't a database size problem. There may be a huge number of books in the system, but few queries search more than thousands.

3. You don't need to be up 24/7. We can be trained. If we know that you're regularly taking the site down for an hour or two, and we know when it will be (3 AM Eastern works for me), we can work around that. This isn't a mission essential site for anyone. It's unexpected outages that people hate.

9/23/2005 1:11 PM  
Anonymous Anonymous said...

Links culled from GOogle searches:

http://dd.cron.ru/dbg/

http://www.oreilly.com/catalog/hpmysql/index.html?CMP=OTC-YF9587575020

http://dev.mysql.com/books/hpmysql-excerpts/ch06.html

http://www.devshed.com/c/a/MySQL/MySQL-Optimization-part-1/

http://www.linuxjournal.com/article/7213

http://www.zend.com/store/products/optimizer-faq.php

Profiling and figuring out where the bottlenecks are will help most I guess... but remember that you're currently insert-heavy but later on you're likely to be select-heavy. You made need different stratergies for each part of the site's lifecycle.

9/23/2005 2:40 PM  
Blogger moirae said...

I agree with Marla - maybe you can integrate with google. They are ridiculously fast with some stuff like their new find a file thingy.

9/24/2005 5:19 AM  
Blogger Tim said...

Quite frankly, no. I hadn't seen that. Some of those tips are very good. Too bad I didn't have that last night. I play around with using CHAR instead of VARCHAR and it jacked the main db up from 100MB to 300MB. Yipes.

The bit about deleting preempting inserting is excellent. People delete books a lot when they don't get what they want. I'll change a lot of routines to simply efface the row instead of deleting it (ie., shift a tag's owner to ""). Then, every once in a while, I can delete them.

9/25/2005 10:08 AM  
Blogger Tim said...

Oh dear, the no-deleting thing is a major success. At least it seems so so far. Excellent suggestion. Thanks.

9/25/2005 11:33 AM  
Anonymous Anonymous said...

Following message came up at about 9:05 EDT Sunday PM at hailelib.

Only unusually thing was the downloading of a document on another computer in our home network. I was in addbooks at the time.
PWH


Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, editor@isidore-of-seville.com and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.
------------------------------------------------------------------------
Apache/2.0.50 (Fedora) Server at www.librarything.com Port 80

9/25/2005 9:16 PM  
Anonymous Anonymous said...

Although what you've done is very interesting (and I appreciate the work you've put in), I wouldn't invest time in helping you fix it unless I knew the result would be open source. I don't know how many other programmers feel the same way; I seem to be the only one who bothered to say so on your blog, so perhaps you'll manage to get some help even without open sourcing it. But still: if you open source it (and announce that you have done so in appropriate forums and on your front page), probably more programmers will flock to help you.

Good luck!

9/26/2005 12:44 AM  
Blogger Tim said...

This comment has been removed by a blog administrator.

9/26/2005 1:04 AM  
Blogger JM said...

Hi again, Tim. I would be happy to spend some time talking about some things with you....but in a few weeks when my own workload settles down (damn enterprise apps, they're so...big!). Let's keep in touch.

9/27/2005 10:29 AM  
Blogger Tim said...

You're right. Sorry. Still learning how long certain actions take.

10/18/2005 10:57 PM  

Post a Comment

<< Home