Hacker News new | ask | show | jobs
by teddyh 4576 days ago
What happens if you also have a “Web Server Administrator” with a similar holistic view? Won’t they step on each others’ toes? Is this not why you have separate roles, like the Dev/Ops division?
1 comments

Well, your web server is not your database server. If it is, just know that your first scaling task will be to separate the two onto separate systems.

I was in a hurry when I wrote my earlier reply - system monitoring is not the only monitoring we perform. We watch the database itself for health and performance. We watch for abnormally long queries, abnormal locking, excessive deadlock resolution, replication health, and exploits.

Backups themselves require a bit of specialization - there are at least four different methods I'm aware of off the top of my head, and the proper choice for which to use is going to depend on the business (we typically set up 3 of the four - logical dumps, rolling binlogs, and binary dumps).

HA plans, particularly live failover, requires a deep knowledge of the database, its replication solution, and your business needs. Safely failing over a live MySQL database in a way that results in no data loss is hard to do.

Quite a bit of specialized knowledge is required to work with databases, even for normal sysadmin tasks.

That's too bad since this is one operation that incurs a very significant local drop in performance. The best optimizations I've ever done in website servers is to introduce local caching where reasonable, ignoring updates. User records, item records, comments, sessions, all would be cached locally. You wouldn't believe the speedup.

Having your data on a different server as your website introduces, effectively, a much higher minimum latency. It adds 5-6 ms to every single page load. If I were writing the site that would be a factor 5 drop at least. This is extra time that your webserver has to keep state for a request, and so effectively can mean a drop of a factor 5 or more.

The DBA said then that this would undermine consistency. And, of course, he's right. However for most things websites don't actually need consistency (plus the web request response model makes it impossible, as the data client-side is not included in transactions). The prices at checkout, sure, there you want consistency. Everything else ... do you really care if it takes 1-2 minutes for your webfarm to be entirely in sync ? (and if it's important enough you can implement "clear this cache item now" requests too, which I did for user records after a complaint)

Databases are so ridiculously slow it's not funny. If your data fits in 64-bit address space, an mmap'ed serialized data structure will blow it's socks off, and for everything that doesn't absolutely require synced data it's fast. If your critical data is small enough, keeping it in memory with a background thread getting signaled to dump it to disk after every update may also work.

I wrote a network-as-a-service infrastructure for an isp once (something to do with data on cell phones), where each machine would happily do 1Gbit of web traffic, directed by it's 5Gb database (meaning it would need to look up user account balance, potentially update it, for every request), consistently and quickly (99.9% of requests was < 1 ms). Mmapped protos FTW !

Databases are a language + a B+tree data structure for those who can't, or won't implement it themselves. Generalized databases have zero hope of competing with specialized ones.