Hacker News new | ask | show | jobs
by falcolas 4576 days ago
I guess I had better tell my colleagues that our jobs are all dead! Wait, it's an absolute statement for a headline, so it's actually "absolute crap".

> These days, the technology decision maker is the dude with Sublime Text open and a cloud control panel up in Chrome.

And when he is successful and gets clients, and a few thousand rows in his database, he realizes that he needs someone to keep that database alive. He needs someone to figure out how to make the cartesian product queries he's written into efficient queries.

At first, he hires a consultant for a few one-off gigs. However, then he's paying someone $200[1] an hour, typically with 8-16 hour engagements. After getting sick of that cost, and still lacking any kind of long term caring about his product, he comes to our team, and hires us to be his DBA, albeit remotely.

Business as a DBA is booming. Nobody thinks they need a DBA, but the reality is that you really can't afford to not have a DBA. We have customers coming on board with no backups, no high availability plans, no disaster recovery plans, queries that are performing cartesian products (and thus taking minutes against very small datasets), and no monitoring. (And yes, a good portion of users come to us while using the "solution" proposed by the OP (like AWS RDS), for many the same problems.)

We set them up with comprehensive backups, automated failover solutions, and 24x7 monitoring. Suddenly, their DB is no longer the primary source of downtime. They're no longer loosing customer engagement because their frontend takes seconds to render. They're no longer in the position of loosing their entire company because some junior developer accidentally dropped their users table in production.

In short, DBAs are a required part of your business, if you're using a database. You just haven't been burned bad enough by a poor database setup to realize it.

[1] Actual hourly rates for a planned engagement. Emergency rates are closer to $450 an hour. Why so much? You can't get a DBA from a college, from a technical school, or from any other form of formal education. Most DBAs these days are grown internally from developers or system administrators who decide to (or are forced to) specialize while on the job. There are single-digit thousands of us world wide, and we're in high demand.

7 comments

Apparently you didn't make it to the conclusion.

  So perhaps the role of the DBA isn’t necessarily dead, it’s just moved
  to its new home at the datastore-as-a-service provider. The successful
  DBA will understand that this new world means handling petabytes of
  data and billions of operations on thousands of logical databases.
  They will cope with less mature database technologies in increasingly
  difficult workload environments. They will automate or die.

  Long live the DBA.
The whole post was making the exact same point as you in the end, the author actually runs a datastore-as-a-service business.
I did, except that:

> The successful DBA will understand that this new world means handling petabytes of data and billions of operations on thousands of logical databases.

is untrue. Most of our customers have DBs that are in the GB size. A few have TB size DBs, and none are on that scale.

datastore-as-a-service doesn't replace DBAs - we make good money being remote DBAs for people who are using datastore-as-a-service providers, because they still run into the same problems as everyone else.

> He needs someone to figure out how to make the cartesian product queries he's written into efficient queries.

I wish our DBAs were like that. I am a developer with Sublime Text. I have to make the queries fast. I have to design good indexes.

They can complain if a query is slow, but they never actually help to fix it. They only have to make and restore backups when disks die.

Sounds like you have a sysadmin with mad interview skills. ;)

Sorry to hear it, either way.

Exactly. You could extrapolate this into any "The [whatever] is dead" statement that we see so often. And it's not getting any easier - e.g. are we capturing and processing less data these days? Is the data we store less valuable?

> You can't get a DBA from a college, from a technical school, or from any other form of formal education. Most DBAs these days are grown internally from developers or system administrators who decide to (or are forced to) specialize while on the job.

I can't say how much truth is in this statement. This stuff if learned organically by doing it on real world projects. It's scary at times to think that you just can't teach this stuff.

> This stuff if learned organically by doing it on real world projects. It's scary at times to think that you just can't teach this stuff.

Yup. And with the supply being so low, it's hard to get a DBA (you'll probably have to steal one from another business), so people are flocking more and more to DBaaS, and DBaaS providers are more than happy to propagate the fiction that "you don't need a DBA, you have us!".

It honestly doesn't bother me much that they make these statements; it's marketing.

On the other hand, believing those statements harm our customers; they spend time and money to migrate to these providers and find out the hard way that they still need someone who can handle their DBs for them. That does bother me.

I disagree. dedicated DBAs are on their way out.

Databases are becoming pretty good at managing themselves and the marginal performance gains from tuning usually are easily offset by throwing bigger kit at the problem or throwing more cash at the plan you are on.

Often the act of throwing "bigger kit" at the problem requires specialized tuning of the DB to be able to take advantage of it.

More memory? Increase the buffer pool size.

Faster HDD? Tweak the settings that determine how many disk operations are attempted every second.

Bigger CPU? Figure out the point of diminishing returns on the number of CPU cores for your DB, and start sharding onto multiple DBs to make sure you can use all of the cores.

SAN? But I thought you wanted performance. ;)

Plus, what gives you the best DB kit for the buck? I could probably tell you that (I am a DBA, and get paid to answer those questions), but do you know? Do you know where to find out?

Databases definitely require knowledgeable tuning. But so do most other complicated moving parts in a modern infrastructure - web servers, app servers, kernels, cache servers, etc. Sysadmins manage those quite successfully. In my experience, they also manage databases quite successfully. The idea that a database is a special beast that requires special keepers is a holdover from a dark age.
> The idea that a database is a special beast that requires special keepers is a holdover from a dark age.

As are most databases (which are 10-20 years old).

Compared to a webserver, a DB is significantly more complicated, and significantly more important to your average business. Nginx takes a nosedive or performs poorly, and nobody really cares. Your DB takes seconds to respond to basic queries, and your entire business suffers.

I'm guessing you've yet to be bitten by DB performance issues that couldn't be resolve by adding indexes or basic query profiling. I'm glad to hear it, because it's not fun to have to set aside your day job and dive beneath that.

This is all trivial knowledge under the "application performance" banner. All of our development staff are capable of diagnosing these issues and providing suggestions.
> All of our development staff are capable of diagnosing these issues and providing suggestions.

I hope you're paying them well, because if they're all crack DBAs on top of being crack developers, they're the proverbial unicorn, and are probably worth $2-3 hundred thousand apiece.

No they just know how not to fuck up their queries to start with so we don't need a DBA to optimise 300 join statements...
Knowing how to write good queries and indexes is admittedly 80% of having a good performing database. However, as with any other 80/20 split, the remaining 20% is the hard part, and you'll end up paying a specialist to take care of it for you.

Or your dataset and query volume will not grow to the point where you actually need a performant database, but that's a business problem, not a DB solution.

MS SQL Server has always been good at managing itself. I can only guess it is getting better.

But Oracle needs the finetuning, and I can't see that changing anytime soon, with Oracle some queries basically require that you use a IOT, other some are better with partitioned storage, and so on.

So your premise is extremely dependent on what DB you use.

I disagree. dedicated DBAs are on their way out.

Is ETL considered a DBA task?

Not for us. Our developers built an API and tool chain to self service.
I normally associate DBA with efficiency and performance of data storage. However, you comment describes a blend between that, and a security administrator who's job include the creation and maintenance of high availability plans, disaster recovery plans (like backups), and monitoring.

Do you think the overlapping speaks for the decrement of either profession specialization?

No - DB software is highly dependent on the system it's on. So much of a DB's underlying performance is going to depend on matching the settings you create for your DB to the hardware you're on (and optimizing the system to run a DB).

In other words, efficiency and performance both depend heavily on the machine your DB is running on.

As such, a good DBA needs to be able to do sysadmin tasks. The business won't care that it was the sysadmin's fault for not realizing that a battery had gone dead on the raid controller, and a DBA shouldn't care either. Their purvue is the database, and everything that it entails.

What happens if you also have a “Web Server Administrator” with a similar holistic view? Won’t they step on each others’ toes? Is this not why you have separate roles, like the Dev/Ops division?
Well, your web server is not your database server. If it is, just know that your first scaling task will be to separate the two onto separate systems.

I was in a hurry when I wrote my earlier reply - system monitoring is not the only monitoring we perform. We watch the database itself for health and performance. We watch for abnormally long queries, abnormal locking, excessive deadlock resolution, replication health, and exploits.

Backups themselves require a bit of specialization - there are at least four different methods I'm aware of off the top of my head, and the proper choice for which to use is going to depend on the business (we typically set up 3 of the four - logical dumps, rolling binlogs, and binary dumps).

HA plans, particularly live failover, requires a deep knowledge of the database, its replication solution, and your business needs. Safely failing over a live MySQL database in a way that results in no data loss is hard to do.

Quite a bit of specialized knowledge is required to work with databases, even for normal sysadmin tasks.

That's too bad since this is one operation that incurs a very significant local drop in performance. The best optimizations I've ever done in website servers is to introduce local caching where reasonable, ignoring updates. User records, item records, comments, sessions, all would be cached locally. You wouldn't believe the speedup.

Having your data on a different server as your website introduces, effectively, a much higher minimum latency. It adds 5-6 ms to every single page load. If I were writing the site that would be a factor 5 drop at least. This is extra time that your webserver has to keep state for a request, and so effectively can mean a drop of a factor 5 or more.

The DBA said then that this would undermine consistency. And, of course, he's right. However for most things websites don't actually need consistency (plus the web request response model makes it impossible, as the data client-side is not included in transactions). The prices at checkout, sure, there you want consistency. Everything else ... do you really care if it takes 1-2 minutes for your webfarm to be entirely in sync ? (and if it's important enough you can implement "clear this cache item now" requests too, which I did for user records after a complaint)

Databases are so ridiculously slow it's not funny. If your data fits in 64-bit address space, an mmap'ed serialized data structure will blow it's socks off, and for everything that doesn't absolutely require synced data it's fast. If your critical data is small enough, keeping it in memory with a background thread getting signaled to dump it to disk after every update may also work.

I wrote a network-as-a-service infrastructure for an isp once (something to do with data on cell phones), where each machine would happily do 1Gbit of web traffic, directed by it's 5Gb database (meaning it would need to look up user account balance, potentially update it, for every request), consistently and quickly (99.9% of requests was < 1 ms). Mmapped protos FTW !

Databases are a language + a B+tree data structure for those who can't, or won't implement it themselves. Generalized databases have zero hope of competing with specialized ones.

Most DBAs these days are grown internally from developers or system administrators who decide to (or are forced to) specialize while on the job.

I agree with you on above statement. Though my conclusion is different from yours. I think in the future the line among DBA/SysAdmin/Developers will become even more blurred , developers will be trained/required to take over more and more work from DBA and sysadmin (DevOps anyone?); consequently, the demand for dedicated roles such as DBA and sysadmins will diminish. Hope I am wrong though.

My role within our DBA group is that of devops exactly (after spending a year as a line DBA).

Given that background I still think that the topic of databases is just too deep for a generalist. I know a lot about the MySQL database (and a little about PostgreSQL), enough to write failover software, automate deployments, write guardian crons which slap down problematic queries & pre-emptively, automate backups, do vip failover and haproxy configuration... and I still have to go to my boss for most of the hard questions.

His knowledge encapsulates 12 years of working with and around MySQL, and it's proven invaluable to our customers. Knowing when to force certain optimizations, how to make subqueries run O(1) vs O(n), how to rebuild a complete database from binary logs, how to configure MySQL to work with SSD caches... these problems don't come up often, but when they do, not having a DBA available to you means contracting out to one at exorbitant rates.

It's the difference between a few minutes of downtime when the proverbial dung hits the fan, versus a few hours or days.

Could you provide any tips/advice/reading recommendations to a programmer who is interested in learning more dba stuff? It would be appreciated!
Sure - here's a list of what's good to know for MySQL. Other DBs are going to have different needs, though the indexing data is good to know regardless.

Start with the book High Performance MySQL. [1]

Follow up with the whitepaper "Causes of Downtime". [2]

Then find a copy of the IMDB dataset, put that in a database, and write an app against it. Make that app perform well, then simulate load against the app (pretend it hit the top of Reddit and Hacker News simultaneously), and keep it performing well.

After that, it's a matter of practical practice.

[1] http://www.amazon.com/High-Performance-MySQL-Optimization-Re...

[2] http://www.percona.com/redir/files/white-papers/causes-of-do...