Hacker News new | ask | show | jobs
by kstrauser 2743 days ago
For that specific use case, maybe. Where it falls apart is in `searchUser`, where the methods (and performance characteristics) of digging through the respective databases are going to be radically different. In a newspaper's implementation, you're going to have to search by date, subject, keyword, body text string, reporter, etc. etc. In MongoDB, that generally involves creating an index on the combination of fields you'll be searching together. In SQL, that generally looks like adding a `where reporter = "Jane Smith"` predicate. The MongoDB version may be faster if you have an enormous amount of data spread across a cluster. The PostgreSQL version will be more flexible when your boss wants to know how many reporters wrote stories about candied yams within three days of the publication of any story containing the word "Thanksgiving".

Being tasked to come up with an abstraction layer that supports the speed of precomputed, clustered indexes with the flexibility of SQL - if I were in a content creation business and not a database engine writing business - sounds like the kind of project that would make me quit my job and go on a one year silent meditation retreat.

1 comments

That objection doesn't make sense. Queries are nothing more than a tree of predicates, how the backend end uses those predicates is not relevant to the API of specifying the predicates. Things like indexes whether in Mongo or in SQL are implementation details that can easily be hidden and not infect the API. You can interpret the tree of predicates into a SQL where clause or into a Mongo index search.

The OP is correct, your app can speak to an internal API without the underlying database infecting your domain code. That in no way implies you can't take advantage of the best of each database.

This is all rosy in theory. In practice, the way you write the query matters quite a bit. Often even between different SQL implementations.

And it's not just queries. Transactions often have important semantic differences that will be visible on application layer - again, even between different SQL implementations (e.g. MVCC vs locks).

> In practice, the way you write the query matters quite a bit.

Which is hidden in the query interpreter for said db implementation. Each implementation can break down that abstract query into whatever implementation specific query works best in that database.

There's always some abstract way to represent it that doesn't require vendor specific knowledge nor does it remove the ability to apply vendor specific abilities.

Look, I just don't agree with you, I agree with OP. Db specific stuff should be hidden from the domain layer by an abstract query representation and an abstract transaction representation to be plugged in at a later time.

Have you ever implemented an ORM? I did.

Stuff like "each implementation can break down that abstract query into whatever implementation specific query works best in that database" is wishful thinking. It's like saying that Java is faster than C++, in theory, because JIT can produce better code. And in theory, it can. In practice, we're not there yet. Same thing with high-level database abstractions - they're all either leaky in subtle ways, or they constrain you to extremely basic operations that can be automatically implemented efficiently on everything (but e.g. forget joins).

> Have you ever implemented an ORM? I did.

Several actually, which is why I know what I'm talking about; I've explored this area extensively. When Fowler first released PEAA I dug and went nuts and spent years coding up and exploring all the possible approaches and figuring out which ones I liked and why and which ones I didn't and why.

> Same thing with high-level database abstractions - they're all either leaky in subtle ways, or they constrain you to extremely basic operations that can be automatically implemented efficiently on everything (but e.g. forget joins).

If you're doing joins in your ORM, frankly, you're doing it wrong. Most ORM's do it wrong, they try and replace what a db does best; the right way to do it is to keep joins in the db. The role of an ORM when used properly is to map tables and views into objects and allow querying over those tables and views with an abstract query syntax. Joins belong in a view, not in code. It's called the object relational impedance mismatch for a reason, you have to draw a line in a reasonable place to get anything reasonable to work well and putting joins into the ORM is crossing that line and is why most ORM's utterly suck. Joins aren't queries, they're projections; put the queries in the code and the projections into the database, this works perfectly and lets each side do what it does best. Queries are easily abstracted, projections are not, projections don't belong in the ORM.

Any language with named tuples has a type system that is sufficiently expressive to handle joins without any sort of impedance mismatch. So, the only reason to avoid them is exactly the one that I cited earlier - the underlying implementation difference between databases.