Hacker News new | ask | show | jobs
by djrobstep 2686 days ago
My pet theory is that NoSQL took off purely because people were sick of having to manage schema changes.

Unfortunately, people reacted to being (justifiably) frustrated with schemas by throwing strict schemas out entirely, instead of making better schema management/migration tools.

4 comments

also, DBAs hate developers. Developers want to make changes to the database to support their classes such as "i need to add a column" and the DBA response is "no." or, even worse, "fill out this ticket and it will get prioritized in the next scrum" meanwhile the developer is at a standstill.

I interviewed at southwest airlines years ago and i don't remember how it came up but we were talking about bottlenecks or something and i brought up the fact that having to go to a DBA to get a column added to a table, no matter how trivial, is a great source of delay. The whole room just nodded and looked at the floor, it was obviously painful for them.

NoSQL took the DBA out of the loop, now the developers were in full control of what was persisted and what wasn't. If they needed a new field they just made it so. On the flip side, DBAs got really freaked out and cried to whoever would listen.

In my experience you either have a DBA report to Developers or Developers report to a DBA. Never give them equal footing (even implied) because they'll just fight.

I think you need to also look at it from the DBAs stand point. If they did whatever the developers want and the system goes down or more likely other parts become slow, it is the DBA who gets the call.

In a large company like SW, the developer requesting some change for their app may have no idea how else the db is being used. What if their requested changes took down the db and prevented reservations from working?

My examples are extreme, but I have seen similar things in my years as both a developer and a DBA at times.

Took me a while to get back here but I do understand your point and it's totally valid. That door swings both ways.

That's why it's hard for the two camps to work side by side.

NoSql gave power to the devs at the expense of the experience and wisdom of the database folks. I bet many many applications and systems were completely screwed datawise more than once because of devs and NoSql.

To be clear, I've known DBAs who act exactly like you described in your original comment. Very annoying.

The best I've seen it work is to have a DBA on the team building the application.

Tell that to AWS. They've banned relational databases for specific workloads because Dynamo (nosql) provides more consistent performance, and is easier to operate.

Tons of conflation of Mongo's problems with those of nosql in this thread.

Did a project with Dynamo last year. Hope never to see it again.

Compared with RBMS tooling looks like a high school project.

DynamoDB has had major improvements in the last few months: e.g. you get dynamic capacity provisioned tables (avoids re/write capacity exceeded exceptions because of capacity planning uncertainty), and transactions, to name two. However, even if you have a hosted RDBMS it has an implicit read and write capacity throughput that you need to design for (e.. hotspots in partitions), you just hit it a bit later in your project. The bounded latency at scale (throughput, and size of tables) is the main win for DynamoDB.
It couldn't possibly because they're trying to push use of their own technology to force dogfooding.
That might make sense if they didn't also offer a plethora of their own built-in-house as well has managed oss relational DBS.
None of those OSS relational DBs offer Amazon lock-in the way DynamoDB does - it's more reliable income if someone uses it, but it also takes more convincing for people to use it. What enterprise would use it if Amazon themselves don't?.
This argument makes no sense at all. What does lock in have to do with Amazon dog fooding its services? They're... trying to lock themselves in? What?
Amazon owns Dynamo DB, and make it available via AWS. If customers use it for non-trivial applications, they have some degree of lock-in, because they can't just migrate to another (self-)hosted instance of e.g. Postgres, MySQL, Oracle, Mongo, what have you.

So Dynamo is an opportunity for Amazon to generate lock-in through their own proprietary software.

But big and likely even medium size businesses are less likely (compared to tiny companies that barely go above the free threshold) to use a new technology without any big well known users, or publicly documented use-cases etc.

One big way companies can provide some confidence to potential customers about their technology is by dogfooding: they use the thing they're trying to "sell" (regardless of whether it's a licence, a service, whatever).

Your average dev is not making decisions based on what might work best for one particular problem Amazon has.
Do you think all of the companies that chose Cassandra and Dynamo were wrong to do so? There's no use case for NoSQL? There were no lessons learned, value adds from NoSQL?

How do you explain the 'NewSQL' approach, which seems to be so clearly borne of what we've learned from NoSQL?

It should be obvious that NoSQL has value, regardless of the issues with one of the earlier NoSQL DBs.

I don't see a value other than fashion driven development, specially when comparing the bare bones browser GUI for Dynamo with something like SQL Server Management Studio or that whole story with primary and secondary indexes, with prices being set by index usage.
The Cassandra design was always a bit of a frankenstein without clear upside to me, but the nosql craze started great conversations.

There is certainly merit beyond fashion to the dynamo architecture, and there are workloads where (for example) HBase is simply the correct type of tool despite the lack of polish of its management interface

I think it also has to do with the source of data. If you receive data from a third party it’s easy to insert the whole document and figure out what parts you need later. If your data comes from your own client interface it makes more sense to build up the data model over time.
You could just plonk the data in a JSONB, BLOB or just plain old file on a disk with a URL pointing to it while you figure it out. And not introduce another super complex to support dependency...
Schema management and automated migration generation frameworks alleviate a lot of that headache. As long as the schema definitions are well structured and can be easily analyzed against a live db to find diffs and generate migration scripts. Django does this very well. You don't even need to use Django for the application, you can use it purely to define schemas and perform migrations on the DB. I'm sure there are alternatives for other languages.

People who got tired of dealing with schemas are now realizing that having zero schema is way more of a headache and way more work than the up front work of creating the schema.

> As long as the schema definitions are well structured and can be easily analyzed against a live db to find diffs and generate migration scripts. Django does this very well.

In my experience Alembic works more better.