Hacker News new | ask | show | jobs
by jackdied 6756 days ago
SQL is good for doing arbitrary queries. Most web sites don't need to do that!

Look at duels.com (a simple player-vs-player game). On each page it needs to know your stats/equipment and that of the person you are dueling. In an SQL database you might store the person/stats in one table, the inventory in another, and the fight record in yet another. In SimpleDB the stats and inventory could be in one domain with fight record in another.

SimpleDB or even dumb file-per-player based systems don't pay the overhead for allowing arbitrary queries so they scale better. Your SQL store will be slowed down by the weight of inactive players and their inventories and all the indexes and data they have lying around. Using in-memory caching (like memcached) is a popular way to pay the SQL cost less often while keeping the arbitrary-query ability on the back end.

And of course don't forget that SimpleDB is hosted by a farm of machines tweaked and maintained by someone else. Your MySQL setup will never be that cool.

Edit: and as for XML,if you add that to SQL, well now you have two problems.

1 comments

I'd caution against using the filesystem unless you think things out very carefully. Typically you'll have a binary tree for the index in an RDBMS which gives you log(N) scaling on lookups for joins etc. If you use the filesystem your lookup time will probably be linear. You also pay a penalty in space for the fact that files have to be a whole number of inodes, so having many small files is a really inefficient use of disk space.

In essence, RDBMS are highly optimised for the sort of things people do with databases, but you need to know what you're doing with them.

Definitely, you don't want to scan the filesystem to do searches. But finding user 1234's profile in profiles/1234.data is easy. In my experience this works well (even with the wasted space) though operations like fsck start to get painful when you have 1 million+ files.

An anecdote on bad linear search: in the 90s I worked on a system in C++ that made heavy use of inheritance and other C-plus-plus-ery. There was a bunch of file opener classes each that found files in a different way (regexp, straight path, glob). They differed only in a method named match. The base class called match on every file name in the directory until it returned true. That was fine for regexps but even if you opened a file _by name_ it took N/2 syscalls to readdir to find it! This worked OK until someone created a directory with 10k entries on a production box. Ouch.