Hacker News new | ask | show | jobs
by dkersten 2063 days ago
> Try to add support for a completely different database. Details of your current database that have leaked into your data layer abstractions will soon become obvious

Well, details of Postgres have leaked into my database queries and schemas, but it was a conscious decision to use Postgres and its features. Sure, it seems nice to be able to swap one database for another, but you lose out on a lot of what a database can do if you stick strictly for the lowest common denominator of features. I use Postgres partly because of its feature set, so I am going to use these features. This does mean that its unlikely I will ever run my software against a different SQL database, but I'm ok with this.

I guess an important note is that you should be aware of it and it should be a conscious decision, rather than something that crept in over time.

But I suppose that's a tangent and not the articles point. Forcing Functions is about unearthing the brittleness that has crept into a codebase over time and I absolutely agree that is a helpful and worthwhile exercise. I've seen plenty of codebases that were meant to be database agnostic, but porting them was still not painless.

4 comments

Yeah I don’t get that obsession with “pluggable databases”. Abstractions have a real cost, and they typically complicate a lot: I would advice to just make sure you centralize the access to your database in a single module / class / whatever, but other than that, swapping out databases seems like a non-goal to me.

How often do people really migrate to a different database? Even when doing so, migrating the existing data always is a lot more tedious than migrating the code, in my experience.

Yeah I don’t get that obsession with “pluggable databases”. Abstractions have a real cost, and they typically complicate a lot: I would advice to just make sure you centralize the access to your database in a single module / class / whatever, but other than that, swapping out databases seems like a non-goal to me.

I tend to agree, as long as the database you’re using is of the free and open source type so it’s a low-risk dependency.

The “pluggability principle” holds more strongly for higher-risk dependencies, IMHO. For example, if you’re working in an online environment and you can’t readily integrate a new payment processor or messaging service or whatever, your existing dependency starts to look like a single point of failure that is not under your control. Some careful abstraction of the essential features and avoiding dependencies on the peculiarities of any specific platform can be a valuable safety blanket if your external dependency catches fire one day.

It's an example of what economists call "The Hold-up Problem", as part of "Incomplete Contracts".

Basically, it's impossible to write a contract to cover all situations, so eventually any two parties will need to renegotiate. If one has leverage over the other, they can do stuff like raise prices or simply refuse service. One factor affecting that leverage is "asset specificity": given an asset that's covered by the contract, is it reusable in different contexts or is it too closely adapted to one purpose?

The literal textbook example is a steel foundry and a railroad. The steel company needs a spur line to its foundry to receive raw materials and send out finished steel. The railroad company can run a spur from its main line to the foundry.

In this case the asset (the spur line) is extremely specific. Without the spur line, the foundry is worthless. Without the foundry, the spur line is worthless. The balance of leverage then comes down to relative costs and gains. The railroad will hold the upper hand, because closing a spur line isn't existential, but it is for the steel foundry.

In the software context it shows up as a "contract" between your software and a database. You can use an ORM to try to insulate yourself, reducing asset specificity, but that decision has costs (eg, specific DB features you must forego). You can choose to increase asset specificity by adapting to only one DB and fully utilising its strengths, but you may face a future hold-up.

Note from the train example that hold-ups needn't be "pay me more". They can also be "I'm walking away". So a hold-up from Oracle will be "pay me more", whereas a hold-up from an obscure database with one developer could be that they simply quit, die or otherwise become incapable of working.

I agree, I see those patterns in the wild a lot. You have a Microsoft shop using .NET and SQL Server yet the devs still abstract the data access layer. I think those patterns just became common 'just in case' they were needed. I find refactoring tools like "extract interface" take care of that so there is not need to write the abstraction until it's needed.

On the other side, I worked on a web app that supported multiple db vendors, we did the classic DAO pattern which worked well. You still get to use custom SQL for each database if you need to.

We tried an ORM at one point which worked out well. It was the same web app and we moved moved some DAO code to the Java Persistence API. We could then build the data access code and include it into our desktop (Mac Windows) and plug it in to a local DB (Derby).

In that case, once JPA was working, the pluggable database was allowing us to save on development costs.

If you are an enterprise software vendor (non-saas), orgs expect you use _their_ db.

If they already have an Oracle license, or a MS-SQL DBA around, you either say goodbye to Postgres or give the contract to your competitor.

> How often do people really migrate to a different database?

Well, I have been part of a migration from MySQL to MariaDB, which was a lot more effort than one would expect given that they're meant to be more or less the same thing. It was a ton of effort and the abstracted ORM logic didn't actually help with this.

So if it doesn't really help that much for a simple case like that, then it doesn't seem like there's much point in my opinion, as porting is going to be work either way (in a non-trivial application with non-trivial data access patterns, at least).

I've done the same migration and I'm curious of what issues you had. For me, it was truly just plug'n'play with no additional work whatsoever.
I don't remember, it was ~4 years ago and for a different company. I believe there were a small number of incompatibilities somewhere (composite indexes maybe? I remember doing stuff with them but don't remember why, could have just been for query optimisation) and a few areas where the performance was degraded (again, I think around indexes, but I could be mistaken).

It wasn't a big deal over all, but it wasn't simple plug'n'play either and required some porting work.

It's more a concern when shipping library/platform code. Applications can be as tightly coupled as they like.
Should library code really be relying on a database? If its a database library, then ok, maybe it can stick to a subset of database features that works across all supported databases, but in that case, testing against these databases is no longer a "Forcing Function", but a core part of the libraries test suite to ensure correctness.

The same goes for platform code, really. If you are advertising your platform to work with databases X, Y and Z, then you really need to be testing against those and not just as a Forcing Function to see if any brittleness crept in, but as a core part of ensuring your platform does as advertised.

> Should library code really be relying on a database

Yes, many libraries need, or are enhanced by, persistence, whether that's a double-entry accounting module, a headless cms, or an e-commerce engine.

> you really need to be testing against those

Library authors need tools and guidance, not arbitrary constraints and high-watermark QA demands. Most of us have one or two DBs that we work with day-to-day but still want libraries we contribute to be broadly portable across any of the backends the framework they plug into supports. e.g. I can't maintain a test suite vs Oracle when I don't license it, but I still want to know that It'll Just Work if someone uses it in such combination, or that it's at least close enough they'll not have trouble making it work.

More broadly I think the myth of testing every supported combination leads to a brittle mindset of saying "not supported, won't help", especially when systematised in a commercial environment, and to me it's the antithesis of good engineering.

Well, if your library can use the "lowest denominator" of supported databases, then there's nothing wrong with doing that. I'd still argue that you don't really support something you haven't tested against, but I guess this is pretty off topic. My overall point was more that sometimes being locked into a single database is a conscious decision because you want to make use of everything a database has to offer.

In the context of the article, for "Forcing Functions", I absolutely agree that switching databases is a useful way to find weakness in your solution.

I've been considering moving away from azure postgres to azure SQL server coz azure postgres runs at a snails pace.
Azure is ridiculously slow unless you ask for decent IOs performance but then it's ridiculously expensive.
Is that true for azure sql server too? I suspected it was faster but I haven't checked.
I don't significantly disagree with you, but let me take the devil's advocate approach.

Adding support for a different database doesn't mean restricting yourself to the lowest common denominator. It means using different techniques, more appropriate for the different database, that may optimize other parts of your data access and modification path, while pessimizing stuff your current database does.

More importantly, it means extracting hard database dependencies like raw SQL or custom ORM fiddling from your business logic and entities, and pushing them behind a module or service boundary.

Raw leverage of the database, if it's dispersed throughout your application, will limit your ability to change your schema (e.g. denormalize an attribute, split or join tables, convert a parent-child relationship to embedded JSON or vice versa) and address performance problems as you scale up. It'll also stop you having a single point of data access where you can partition or duplicate your data into different stores with different capabilities more suited to their access and modification patterns. These kinds of things become really important when the database becomes a bottleneck in your system.

Just checking if I understood your point: for the purpose of Forcing Functions, running against a different database than the one designed for (and therefore where the tradeoffs may be different and things may run inefficiently) is still useful because it helps unearth design flaws, corner cases or brittleness?

If so, then, sure , I agree with you. Not all reliance on a target database is actual features that don't have an easy or direct way to port.

Almost.

More that being forced to separate church and state, forced to keep business logic separate from database manipulation, and forced to go through an abstraction interface, then lets you leverage the interface later if and when the database becomes a bottleneck.

It's a devil's advocate position. I actually think it's unrealistic and probably not worth it at the time. The forcing function is what's useful, not the database portability per se. And whether it's actually useful depends on being so successful that you have so much data that your regular RDBMS can't cope with some access or update patterns. That's a hefty bet early in anything, and probably doesn't make business sense, until it does, then you wish you did things differently.

(Not coincidentally, it's a position we're in at my workplace; the database has hit its limits and we're needing to use hybrid approaches to hit latency NFRs.)

Agreed. It makes little sense to try to write your SQL queries to run fine on multiple different DBMSs. Different SQL DBMSs should be treated as different languages.

Presumably the idea is based on an analogy to code portability: it can be good to ensure your C++ code compiles fine with multiple different compilers. Really though, it's more akin to writing code that compiles as both C# and Java; clearly madness.

I took this to mean be able to swap in and out your persistence layer.

So my app can make calls which will persist data to a SQL database, or I can swap that layer out with another that persists to NoSql storage in the cloud. Possible because the persistence layer exposes an API or interface that is agnostic to the actual implementation of the layer

???