| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lmm 1895 days ago

> This statement makes me think that you either do not prefer to use SQL RDBMS, don’t have to use them very often, believe they are some dusty piece of tech, or all of the above when my experience has me believing that RDBMS are absolutely the most common persistence layer I encounter in JVM, .NET, PHP, and Python codebases.

Popular doesn't mean good. I have to use SQL RDBMS a lot, and I respect the amount of low-level engineering work that has gone into them, but yeah I do hate using them.

> I don’t think I’ve ever heard a senior JVM based engineer proclaim that JPA/Hibernate are “the best way... to produce working applications”. It simply isn’t. For basic CRUD applications you will AT BEST barely write fewer lines of code with JPA than with native JDBC queries and ResultSet mapping and have all the lock-in and performance drawbacks of JPA.

I don't think seniority is a good metric, but I've got 10+ years of professional JVM experience for what that's worth. Using JPA means you'll write significantly less code, and the code you get to skip is the most tedious (and therefore rarely read or reviewed) part. Lockin is significantly lower: you can seamlessly migrate between databases in a way that you can't with handwritten SQL, and you can migrate between different JPA implementations with minimal work (not that I think there's actually much value in doing that, but the capability is there). Performance for equivalent effort will be significantly better because you've got a caching layer that actually works already in place (unless you turn it off, but, uh, don't do that).

Of course if your CRUD application is performance-critical enough to justify hand-tuning every query and implementing a correct caching layer by hand then you'll do better without the framework. But realistically that's a vanishingly rare case.

> Lazy loading will inevitably wind up with Session scope problems with any kind of concurrency, forcing nasty internal list enumeration to force a faux eager fetch to work around the problems.

Depends on your application - a lot (not all, but a lot) of systems decompose naturally into a sequence of isolated steps that provide a natural session boundary. E.g. for a REST API or MVC-style webapp just put the session in the view and get on with your life - people have some philosophical objection to this but it works really well. (I actually don't think MVC is a great way to structure a webapp, but that's a separate fight).

> Fetching just the columns you need for a particular projection will have you writing either SQL or Hibernate “SQL” in annotations.

True. But, on the assumption that you've actually structured your entities to follow your domain, how often is that something you actually gain a significant amount of performance (or anything) from?

> If you have a mix of JDBC and JPA in a codebase you will inevitably wind up with enough consistency and visibility issues as to either ditch one of them or ditch the entire codebase.

This I completely agree with (at least for people who don't make any actual effort to address the problem), and I think it's where articles like the OP come from. I see a lot of people follow a pattern something like: their application needs some vaguely tricky query, and rather than spending 5 minutes looking up how to do it in the Hibernate documentation they decide to handwrite the SQL for it instead. Then they realise that this makes the Hibernate cache for the affected entity invalid, and rather than look up how to selectively invalidate the cache for the entities affected by their query they disable the cache globally. Then they complain that Hibernate is slow and decide the solution is to handwrite the SQL for other queries instead. JPA works great, but only if you're willing to actually try to use it.

> I expect every single one of my backend engineers (on any tech stack) to understand the fundamentals of SQL INSERT, UPDATE, and DELETE statements.

Ah, but that isn't actually enough. The people talking about getting better performance from handwriting your SQL are people who understand different types of indices, different join strategies, how the query planner chooses which one to use. And if you put the same amount of time and effort into understanding Hibernate, you can get great things out of it.

1 comments

manyxcxi 1895 days ago

And on many of these points I agree...

I was careful to choose popular, and not project opinions about SQL/NoSQL/etc. In my field, most of our data is relational and we use NoSQL for caching, queues, shared work, ETL performance, dashboards, etc. but at the end of the day for persistence, the RDBMS is where the “gold copy” data ends up.

As you mentioned previously, knowing the tool set and the domain is critical to either approach. At a certain point with technology the benefits and costs are weighted by subjective preference and project specific needs. I have weighted SQL higher than JPA by many factors because I can take my SQL knowledge to any backend project, and I’ve been a part of a lot of different tech stacks in my career.

Maybe my travels have lead me to be surrounded by many more engineers that trust the database (and their knowledge of the database) to handle the persistence without a too many layers in between.

I, personally, have never seen a JPA based project that actually worked well with large-ish datasets, high concurrency, or when non-trivial ETL functions are part of the system- and this general domain has been the majority of my career, so I may have blinded myself to THE majority being confused for MY majority.

Thanks for the response and a good look at the topic from a different point of view.

lmm 1895 days ago

> I was careful to choose popular, and not project opinions about SQL/NoSQL/etc. In my field, most of our data is relational and we use NoSQL for caching, queues, shared work, ETL performance, dashboards, etc. but at the end of the day for persistence, the RDBMS is where the “gold copy” data ends up.

I'd worry about using an RDBMS in that situation because it's fundamentally mutability-first. I prefer to regard the user's actions as the "gold copy" and the current-state-of-the-world as a transient derived thing (i.e. event sourcing), but that doesn't really play to the strengths of an RDBMS. You also have to make global decisions about transactionality (in particular, you can't easily commit a data write without committing updates to all your secondary indices), and the much-vaunted relational integrity can be a problem because you can only represent constraints for cases where the appropriate response to a constraint violation is dropping the write on the floor. And of course you can't safely allow the ad-hoc querying that SQL is designed for.

I do think traditional RDBMS make some sense at the end of an ETL pipeline - where the secondary indices can be a big help for the ad-hoc querying/aggregation that you want to do in a reporting environment. But transactions don't make sense in that environment because it's essentially read-only (or at least single-writer), so you're still paying for a lot you're not using. I wouldn't use JPA for this, but I wouldn't really write code for this kind of environment at all - the point is to expose the data in a structured form for non-code tools.

Essentially I find mature systems outgrow SQL databases - the case where an RDBMS actually fits is the early stages where you want to run ad-hoc reports against your live datastore, you want to keep the current state of the world rather than worrying about history, having to manually fail over to a replica if master goes down is ok, updating all your indices synchronously is fine because write performance isn't an issue yet, and you can put constraints in the database because blowing up with an error page is an adequate response when the user breaks the business rules. Using JPA increases the rate at which you can iterate on the system, which is the priority for that kind of use case.