Hacker News new | ask | show | jobs
by csmuk 4576 days ago
I disagree. dedicated DBAs are on their way out.

Databases are becoming pretty good at managing themselves and the marginal performance gains from tuning usually are easily offset by throwing bigger kit at the problem or throwing more cash at the plan you are on.

3 comments

Often the act of throwing "bigger kit" at the problem requires specialized tuning of the DB to be able to take advantage of it.

More memory? Increase the buffer pool size.

Faster HDD? Tweak the settings that determine how many disk operations are attempted every second.

Bigger CPU? Figure out the point of diminishing returns on the number of CPU cores for your DB, and start sharding onto multiple DBs to make sure you can use all of the cores.

SAN? But I thought you wanted performance. ;)

Plus, what gives you the best DB kit for the buck? I could probably tell you that (I am a DBA, and get paid to answer those questions), but do you know? Do you know where to find out?

Databases definitely require knowledgeable tuning. But so do most other complicated moving parts in a modern infrastructure - web servers, app servers, kernels, cache servers, etc. Sysadmins manage those quite successfully. In my experience, they also manage databases quite successfully. The idea that a database is a special beast that requires special keepers is a holdover from a dark age.
> The idea that a database is a special beast that requires special keepers is a holdover from a dark age.

As are most databases (which are 10-20 years old).

Compared to a webserver, a DB is significantly more complicated, and significantly more important to your average business. Nginx takes a nosedive or performs poorly, and nobody really cares. Your DB takes seconds to respond to basic queries, and your entire business suffers.

I'm guessing you've yet to be bitten by DB performance issues that couldn't be resolve by adding indexes or basic query profiling. I'm glad to hear it, because it's not fun to have to set aside your day job and dive beneath that.

This is all trivial knowledge under the "application performance" banner. All of our development staff are capable of diagnosing these issues and providing suggestions.
> All of our development staff are capable of diagnosing these issues and providing suggestions.

I hope you're paying them well, because if they're all crack DBAs on top of being crack developers, they're the proverbial unicorn, and are probably worth $2-3 hundred thousand apiece.

No they just know how not to fuck up their queries to start with so we don't need a DBA to optimise 300 join statements...
Knowing how to write good queries and indexes is admittedly 80% of having a good performing database. However, as with any other 80/20 split, the remaining 20% is the hard part, and you'll end up paying a specialist to take care of it for you.

Or your dataset and query volume will not grow to the point where you actually need a performant database, but that's a business problem, not a DB solution.

I wonder how much of your business is represented by startups and really capable development teams ... and how much is making hibernate perform better (or should I say "less horrible").

Development teams that go after speed will soon realize that re-serializing the data they need for a web page push and sending it over the network results in a factor 5-50x drop in performance over having that data in a local data structure. And God forbid you have even minor packet (like 0.0000001%) loss in the network between your webserver and database server. Your 99% tail latency will sprint for the 10s mark.

And as the DBA's here said : most databases are in the GB range, with a few in the tens or hundreds. SQLite will beat the crap out of any other solution at those sizes.

People don't realize how much tail latency affects maximum QPS. Once you calculate how one affects the other you see that tail latency is the enemy of performance in webservers. A webserver that can generate 100% of responses in 1ms can serve 1 million qps (ie. can saturate a 10Gbps link). At 99% at 1ms and 1% at 10ms (the very minimum to execute a single query against a database that isn't local to your machine), you're left with 900k. More typical database figures would be 20ms average and 600ms for the 1%, which will leave you with 30k QPS. In this example using a database cost you 97% of your original performance.

That 97% figure is perfectly normal. So if you want decent performance, using a database server (in the serving path) is just not in the cards at all.

MS SQL Server has always been good at managing itself. I can only guess it is getting better.

But Oracle needs the finetuning, and I can't see that changing anytime soon, with Oracle some queries basically require that you use a IOT, other some are better with partitioned storage, and so on.

So your premise is extremely dependent on what DB you use.

I disagree. dedicated DBAs are on their way out.

Is ETL considered a DBA task?

Not for us. Our developers built an API and tool chain to self service.