Hacker News new | ask | show | jobs
by stingraycharles 2062 days ago
Yeah I don’t get that obsession with “pluggable databases”. Abstractions have a real cost, and they typically complicate a lot: I would advice to just make sure you centralize the access to your database in a single module / class / whatever, but other than that, swapping out databases seems like a non-goal to me.

How often do people really migrate to a different database? Even when doing so, migrating the existing data always is a lot more tedious than migrating the code, in my experience.

6 comments

Yeah I don’t get that obsession with “pluggable databases”. Abstractions have a real cost, and they typically complicate a lot: I would advice to just make sure you centralize the access to your database in a single module / class / whatever, but other than that, swapping out databases seems like a non-goal to me.

I tend to agree, as long as the database you’re using is of the free and open source type so it’s a low-risk dependency.

The “pluggability principle” holds more strongly for higher-risk dependencies, IMHO. For example, if you’re working in an online environment and you can’t readily integrate a new payment processor or messaging service or whatever, your existing dependency starts to look like a single point of failure that is not under your control. Some careful abstraction of the essential features and avoiding dependencies on the peculiarities of any specific platform can be a valuable safety blanket if your external dependency catches fire one day.

It's an example of what economists call "The Hold-up Problem", as part of "Incomplete Contracts".

Basically, it's impossible to write a contract to cover all situations, so eventually any two parties will need to renegotiate. If one has leverage over the other, they can do stuff like raise prices or simply refuse service. One factor affecting that leverage is "asset specificity": given an asset that's covered by the contract, is it reusable in different contexts or is it too closely adapted to one purpose?

The literal textbook example is a steel foundry and a railroad. The steel company needs a spur line to its foundry to receive raw materials and send out finished steel. The railroad company can run a spur from its main line to the foundry.

In this case the asset (the spur line) is extremely specific. Without the spur line, the foundry is worthless. Without the foundry, the spur line is worthless. The balance of leverage then comes down to relative costs and gains. The railroad will hold the upper hand, because closing a spur line isn't existential, but it is for the steel foundry.

In the software context it shows up as a "contract" between your software and a database. You can use an ORM to try to insulate yourself, reducing asset specificity, but that decision has costs (eg, specific DB features you must forego). You can choose to increase asset specificity by adapting to only one DB and fully utilising its strengths, but you may face a future hold-up.

Note from the train example that hold-ups needn't be "pay me more". They can also be "I'm walking away". So a hold-up from Oracle will be "pay me more", whereas a hold-up from an obscure database with one developer could be that they simply quit, die or otherwise become incapable of working.

I agree, I see those patterns in the wild a lot. You have a Microsoft shop using .NET and SQL Server yet the devs still abstract the data access layer. I think those patterns just became common 'just in case' they were needed. I find refactoring tools like "extract interface" take care of that so there is not need to write the abstraction until it's needed.

On the other side, I worked on a web app that supported multiple db vendors, we did the classic DAO pattern which worked well. You still get to use custom SQL for each database if you need to.

We tried an ORM at one point which worked out well. It was the same web app and we moved moved some DAO code to the Java Persistence API. We could then build the data access code and include it into our desktop (Mac Windows) and plug it in to a local DB (Derby).

In that case, once JPA was working, the pluggable database was allowing us to save on development costs.

If you are an enterprise software vendor (non-saas), orgs expect you use _their_ db.

If they already have an Oracle license, or a MS-SQL DBA around, you either say goodbye to Postgres or give the contract to your competitor.

> How often do people really migrate to a different database?

Well, I have been part of a migration from MySQL to MariaDB, which was a lot more effort than one would expect given that they're meant to be more or less the same thing. It was a ton of effort and the abstracted ORM logic didn't actually help with this.

So if it doesn't really help that much for a simple case like that, then it doesn't seem like there's much point in my opinion, as porting is going to be work either way (in a non-trivial application with non-trivial data access patterns, at least).

I've done the same migration and I'm curious of what issues you had. For me, it was truly just plug'n'play with no additional work whatsoever.
I don't remember, it was ~4 years ago and for a different company. I believe there were a small number of incompatibilities somewhere (composite indexes maybe? I remember doing stuff with them but don't remember why, could have just been for query optimisation) and a few areas where the performance was degraded (again, I think around indexes, but I could be mistaken).

It wasn't a big deal over all, but it wasn't simple plug'n'play either and required some porting work.

It's more a concern when shipping library/platform code. Applications can be as tightly coupled as they like.
Should library code really be relying on a database? If its a database library, then ok, maybe it can stick to a subset of database features that works across all supported databases, but in that case, testing against these databases is no longer a "Forcing Function", but a core part of the libraries test suite to ensure correctness.

The same goes for platform code, really. If you are advertising your platform to work with databases X, Y and Z, then you really need to be testing against those and not just as a Forcing Function to see if any brittleness crept in, but as a core part of ensuring your platform does as advertised.

> Should library code really be relying on a database

Yes, many libraries need, or are enhanced by, persistence, whether that's a double-entry accounting module, a headless cms, or an e-commerce engine.

> you really need to be testing against those

Library authors need tools and guidance, not arbitrary constraints and high-watermark QA demands. Most of us have one or two DBs that we work with day-to-day but still want libraries we contribute to be broadly portable across any of the backends the framework they plug into supports. e.g. I can't maintain a test suite vs Oracle when I don't license it, but I still want to know that It'll Just Work if someone uses it in such combination, or that it's at least close enough they'll not have trouble making it work.

More broadly I think the myth of testing every supported combination leads to a brittle mindset of saying "not supported, won't help", especially when systematised in a commercial environment, and to me it's the antithesis of good engineering.

Well, if your library can use the "lowest denominator" of supported databases, then there's nothing wrong with doing that. I'd still argue that you don't really support something you haven't tested against, but I guess this is pretty off topic. My overall point was more that sometimes being locked into a single database is a conscious decision because you want to make use of everything a database has to offer.

In the context of the article, for "Forcing Functions", I absolutely agree that switching databases is a useful way to find weakness in your solution.

I've been considering moving away from azure postgres to azure SQL server coz azure postgres runs at a snails pace.
Azure is ridiculously slow unless you ask for decent IOs performance but then it's ridiculously expensive.
Is that true for azure sql server too? I suspected it was faster but I haven't checked.