Hacker News new | ask | show | jobs
by JesseAldridge 2457 days ago
This was my position for a while. ORMs introduce a layer of magic which obscures what's actually going on under the hood. I decided I would just make raw SQL queries and handle mapping data explicitly.

I quickly ended up with a lot of duplicated code. So then I thought, "Well ok, I should add a bit of abstraction on top of this..." I started coding some simple functions to help map the tabular data to objects. One thing led to another and suddenly I looked at what I had done and said, "Wait a minute..."

21 comments

There is a big difference between just writing helper functions to construct SQL and convert data types, and OO-style magical auto-persisted objects. The latter is what I don't like about ORMs but the former is fine. I feel that this is an important distinction to make.

As an example, the sqlalchemy docs[0] make this very clear: there's an ORM, but there's also just a core expression library that simply helps you connect to the database and construct queries.

[0]: https://docs.sqlalchemy.org/en/13/

Agreed. Helpers (and indeed types) can make working with SQL an actual pleasure. You do need to learn the SQL, though. (My TypeScript/Postgres solution, in this vein: https://github.com/jawj/mostly-ormless/blob/master/README.md).
I've never been a fan of codegen, but I think I could get past it for this library - it looks great!

I love how it let's you use SQL, while taking full advantage of TypeScript's wonderful typing system to give you intellisense and compile-time checking. Reminds me a bit of the SQL type provider for F# (which I was amazed by when I first saw it in action).

I really like the way the readme has been written too - it gives a real insight into the thought processes that led to the final result.

I've never been a fan of codegen, but I think I could get past it for this library - it looks great!

Good news. :)

Maybe we can agree that typegen smells less than codegen?

Reminds me a bit of the SQL type provider for F# (which I was amazed by when I first saw it in action).

I should look that up — sounds interesting.

I worked for a bit on a code gen based typescript postgres builder, but haven't had time lately to build it out - https://github.com/Sammons/morbid

I really think typescript would benefit from a good solution to this.

I do like the code-gen solution a lot. You can create code that is far less bloated than a generic framework.

I am old fashioned. I like to start with the database schema and generate code from that. I make a change in the schema I regen the code. Thanks for partial classes in C# I can persist customizations between code-gens if necessary.

Wow this is great! Very well written README.

What just blew me away is the thing with the `JOIN` and the `to_jsonb(authors)`, all with complete typing support for the nested author object. I was actually looking to use a classical, attribute driven query generator (with the sort of chaining API everyone is used to: `tableName.select(...coumns)` etc.) for my next project involving to maybe replace/wrap/rewrite a Rails app and its ORM with Typescript and Node. Maybe I'm trying this instead I'm already half sold. Just worried about forcing colleagues having to learn SQL instead of using a fancy wrapper.

I'm also awed by this!

> Just worried about forcing colleagues having to learn SQL instead of using a fancy wrapper.

My current team is pretty junior, and I don't see any problem with this. Simple SQL queries are really easy to learn, and complex queries are harder to understand with ORMs than in raw SQL.

Moreover, knowing SQL is a useful, marketable skill that will stay relevant for many years to come. If there's some resistance, I can easily convince the team that going this route will benefit them personally.

Back to the README, there are two questions I'd like to see addressed:

1. Whether `Selectable[]` can be used to query for a subset of fields and how.

2. In the `to_jsonb(authors)` example, what would you get back in the `author` field if there were multiple authors with the same `author.id` value? An array of `author.Selectable` objects? This part is awesome but brittle, isn't it?

I would love to see this move forward! I will definitely play with it and consider it for my next project.

I'm also awed by this!

:)

1. Whether `Selectable[]` can be used to query for a subset of fields and how.

Right — this is not (currently) supported. I guess if you had wide tables of large values, this could be an important optimisation, but it hasn't been a need for me as yet.

2. In the `to_jsonb(authors)` example, what would you get back in the `author` field if there were multiple authors with the same `author.id` value? An array of `author.Selectable` objects? This part is awesome but brittle, isn't it?

Multiple authors with the same id isn't going to happen, since id is intended as a primary key, so I'd argue that the example as given isn't brittle. On the other hand, there's a fair question about what happens for many-to-many joins, and since my use case hasn't yet required this I haven't given it much thought.

OK, I gave the one-to-many queries a bit more thought, and the converse join query (getting each author with all their books, rather than all books each with their author) works nicely with a GROUP BY:

    type authorBookSQL = s.authors.SQL | s.books.SQL;
    type authorBookSelectable = s.authors.Selectable & { books: s.books.Selectable };

    const
      query = db.sql<authorBookSQL>`
        SELECT ${"authors"}.*, jsonb_agg(${"books"}.*) AS ${"books"}
        FROM ${"books"} JOIN ${"authors"} 
          ON ${"authors"}.${"id"} = ${"books"}.${"authorId"}
        GROUP BY ${"authors"}.${"id"}`,

      authorBooks: authorBookSelectable[] = await query.run(db.pool);
This exploits the fact that selecting all fields is, logically enough, permitted when grouping by primary key (https://www.postgresql.org/docs/current/sql-select.html#SQL-... and https://dba.stackexchange.com/questions/158015/why-can-i-sel...)

I'll update demo.ts and README shortly.

Right, only querying a few fields seems not to be a builtin feature. Looks like you have to create the partial selectable type yourself and there is no support to typecheck that the correct columns in the select are included.

Your second case, if I recall this correctly (ActiveRecord made my SQL skills fade away), this plain JOIN would just return a row with the same book but a different author. `to_jsonb(authors.*)` is just operating on a single row. But what you want is possible (aggregating rows into a JSON object) by using `jsonb_agg`. Whether the lib supports inferring the correct typings for that is another question though.

> Just worried about forcing colleagues having to learn SQL instead of using a fancy wrapper

I'd argue that learning SQL is essential for any developer.

It's also a "reusable" skill that will stand them in good stead for decades - whereas learning how to use the fancy wrapper is only useful until the next new shiny comes along.

I’d add that it’s essential so you can understand how to optimise and debug a query. You lose a lot of power if you can’t open up a console to describe or explain things.

The long-standing ORMs do a pretty decent job of writing efficient queries these days though. You can go pretty far without knowing much and that’s not a bad thing either.

This looks great. It looks similar to JooQ.
I'm not even someone that has used multiple orm styles extensively, but it is disturbing/darkly humorous how many orm libs there are.

That lastpost you can map a half dozen Java frameworks to each of the acts.

Personally I never found an orm that tracked which attrs in objects were actually mutated so that only mutated columns would be updated/inserted, but again I never did a lot of orm.

I really like this. There is only one thing that bothered me. I'd rather pass the job to the pool than pass the pool to the job... something like...

const existingBooks = await pool.exe(select("books", { authorId }));

or

const existingBooks = await pool(select("books", { authorId }));

QueryDSL (http://www.querydsl.com/) does something like this for Java. It can generate classes from tables, but even with those, all queries / statements that hit the database are manually built using a query builder to avoid syntax and type errors. I.e. no caching or automatic database updates.
knex.js is another example of such a query builder library in the Node.js ecosystem
Fully agree! Some mapping code is _required_ in your application, otherwise it wouldn't be able to talk to your database at all.

However, I've never understood why people write this mapping code manually. I believe in code generation tooling as a potential solution for this (where types and maybe a full data access API is auto-generated based on the database schema).

Yeah, ORMs have grown to mean more than mapping relational data to objects. An example of "just" this mapping can be seen in PureORM[0].

[0]: https://github.com/craigmichaelmartin/pure-orm

> There is a big difference between just writing helper functions to construct SQL and convert data types, and OO-style magical auto-persisted objects. The latter is what I don't like about ORMs but the former is fine. I feel that this is an important distinction to make.

What's the big difference? Why do you like the former but not the latter? What are the characteristics of the former that makes it distinct from the latter?

Out of curiosity what platform and tech where you using? I am making the assumption of a predominately OO one based on the virtues of ORM. I have always found that when I try to solution back end or middleware based platforms with OO dominate languages (read Java, C#, et. al.) that there quickly becomes an impedance mismatch and any communication with the database becomes a monster of mapping OO philosophy to relational theory, whether that be via home rolled or an ORM.

That being said, I personally have found that I do not like OO languages for back end dev and I find that functional languages such as any variety of LISP marry extremely well to the transnational and process oriented nature of back-end systems as well as lend themselves to not having to jump thru hoops to contort the data into relational sets (Clojure's destructuring is an absolute life saver here). I find that there is little to no duplication of code in regards to transferring data to the db. You may want to give Clojure or F# (depending on your stack) a try for your back end and see if it does not alleviate a host of issues with trying to develop a process and transaction oriented system, which most back ends fit that definition.

I find the converse to be true for the front end. I find most attempts to deal with the UI in anything other than objects and components (read jQuery, React Hooks), turns to spaghetti rather quickly.

If you are using OO languages to communicate and transfer data to the DB you may very well be trying to solution for the impedance mismatch that is easily solved by using a functional language.

I can recommend hugsql to anyone who wants to work with SQL in Clojure. You basically write pure SQL with some minor templating helpers and then you get the data in a map with Clojure data types. Very nice and minimal overhead: https://www.hugsql.org/
> I find the converse to be true for the front end. I find most attempts to deal with the UI in anything other than objects and components (read jQuery, React Hooks), turns to spaghetti rather quickly.

What alternatives have you tried?

ClojureScript, Reflex, Grapefruit, Seesaw and a host of others, It's my opinion (so take it with a grain of salt) and it very well may be the way my brain works but I just find functional to not marry well to UI development. For the service and process oriented stuff associated with the front end I think it is great, but when it comes to modeling components, I find objects and inheritance work far better.

This is one of the reasons I have long been a huge proponent of Javascript despite it warts, as it can be OO when I need it to be OO and functional when I need it to be functional.

I've used ClojureScript with re-frame professionally for over 2 years, and it's been the first time in over 20 years of development that I've enjoyed working on the front-end. Unidirectional data flow with a single app atom has been a dream to work with.
It was a Flask app using SqlAlchemy (so Python). I'm not sure functional programming would have changed the situation much. I imagine there would still be repeated patterns involving reading and writing to the database in slightly different ways, and it would still make sense to use some sort of library. But I haven't used functional languages much, so I can't say for sure either way.
Well, the difference is that in a data oriented language, you do not need to map Objects to relations. You can get the data back in the relational format and use it as is in your app. So you don't need an ORM. You might still have a library to help you build dynamic SQL queries, but no object-relational mapping needed.
> Well, the difference is that in a data oriented language, you do not need to map Objects to relations. You can get the data back in the relational format and use it as is in your app.

You can do that in an OO language, too; relations (whether constant or variable, and whether there data is held locally by the program or remotely, as in an RDBMS) are perfectly valid objects.

Query builders != ORMs.

A query builder carefully preserves the underlying relational and RPC semantics and exposes all of that to the user in an easier-to-use form. That’s just good cautious modest abstraction.

An ORM believes it knows way better than those dumb RDBs how a database ought to behave by ingeniously pretending that everything you’re dealing with is just nice simple familiar arrays of local native class instances. Which, like all lies, ends up spawning more lies as each one starts to crack under inspection, until the whole rotten pile catastrophically collapses under the weight of its own total bullshit.

And of course it goes without saying which of these approaches our arrogant grandstanding consequence-shirking industry most likes to adopt.

But now you know it's not magic :)

Maybe the problem is not that you don't need a mapping layer, but because ORMs are obscure. And maybe they are obscure not because SQL is such a cursed spot, but because object-oriented programming ITSELF drift toward obscurity and magic. Don't you get the same feeling of obscurity about other libraries, e.g. web servers or clients? I often find the bare specs much clearer than (supposedly simplified) OO libraries that implement them.

Yes.

If you stick to using the ORM for what amounts to (mostly) just PODs, it's syntactic sugar that can really help readability.

Sometimes you just should live with duplicated code. It's OK.
No, it is not okay. If you need insert/update/select for every object/table that is way too much duplication. It becomes very irritative when the schema changes. There should be table meta data in such a case but one should also know what one is doing. Having no idea that under the water 14 joins are done is not a good situation either.
Well sort of. In my view, duplicate SQL chunks that have defined business logic should either be a new table/view or extremely well-documented with really rigid communication policies for changes.

For example, a company with many data analysts/scientists who may each be writing their own queries. As a basic example, the definition of some “very important” company metric changes, then there would need to be a large number of disperse queries to change.

But an ORM isn’t the answer for the above situation either.

It's relative, if the duplication is that large maybe you do need to abstract.

It also sounds like you would be well served using a service abstraction at that point to remove the data layer from client scope entirely.

The "model changes, now we have to change it every where" isn't going to be solved by abstraction, it's only limited by the amount you're willing to limit access to the underlying model, if you need that information, you need to share the model.

The best solution to this I've seen in practice is domain modelling, colocating shared code near other users. When things get too distant you start using anti corruption layers which allows more flexible model changing.

But at the end of the day this is essential complexity, orm, or any other solution is never going to be able to hide the fact that you need information elsewhere in the system to be useful.

Thank you! You just said something I very much agree to but never dare to say out loud.
I think that the way to say it without starting a war is to preface it with something like:

> Well, redundancy and dependency both have downsides, but in this case...

But you shouldn’t live with a hand-rolled pseudo ORM that stumbled into existence when there’s developed alternatives
Why not?

I've used hand rolled pseudo ORMs before.

I prefer just plain SQL but for the application I had there was a common access pattern that was worth abstracting out in a DRY sense.

That doesn't mean I want or need a complete ORM. Just a consistent access at certain table types.

I’ve never seen a homegrown ORM that was better than a third party one. Whenever there is an issue - and there are always issues - you have to dig into the code, because they are never documented well.

There is usually a feature that no one thought about and then you have to make modifications to the custom ORM and you get an even bigger mess.

Better is a subjective term.

I wouldn't say what I built was a better ORM. But I would 100% say it's a better solution to the problem I faced.

It didn't get in the way of writing SQL, but it did reduce the boilerplate and repetitive gruntwork.

"Issues" - no, it was a function call deep and incredibly clear and close to what was happening.

sort of agree but also: the good third party ORMs started life as homegrown ORMs.
Usually third party stuff has documentation. Home grown stuff often hasn't. And usually better tested.
Entity Framework didn’t.....
You just said "you know, if you wanted to have a shitty burger, you can get it right there for half the price of a national chain and it will be at least as good"
> I've used hand rolled pseudo ORMs before.

Is your pseudo ORM as well documented as a commonly used ORM? Can I google (or search your wiki) for common issues?

There's a middle ground.

Micro ORMs.

A micro ORM is just an ORM, written well and modularly. It isn't a middle ground - it's choosing to use a well written library.

A lot of people conflate ORM's leaking because of poor designed library with ORM being a bad abstraction in general.

One of the most popular Micro ORMs for C# is Dapper which is used by Stack Overflow.

There is no real abstraction. You write standard SQL and it maps your recordset to an object. You know exactly what code is running.

There are extensions that will take a POCO object and create an insert statement and I believe updates, but where ORMs usually get obtuse and do magic are Selects. It’s hard to generate a suboptimal Insert or Update.

So.. pattern I see emerging. Use orm for the common stuff and execute sql for complicated queries (like reports)
The informal definition I have of a micro ORM is an ORM without an identity map and without lazy fetching through proxy properties. Are there any more concrete definitions?
> There's a middle ground.

> Micro ORMs.

And there's the mystical fourth option of simply not bothering with objects in the first place. No objects, no need to do object-relational mapping.

So then you’re mapping to whatever data structure your app uses instead of objects. In OOP languages like Python, everything is some type of object anyway.
That’s just a semantic game. If your language returns the result of a query as a generic array of generic dictionaries (or whatever), that isn’t mapping, nor is it object oriented in principle.
also implicit row mappers so one can still do the queries manually without writing all the oo glue
An in house solution is almost always better than an external dependency
This is correct. An in-house solution is a solution developed in-house for your specific problem, which no one else has ever had exactly. The more specific the need, the more the benefit of the made-to-measure solution. The alternatives are something your organization didn't develop, which may be better, but you don't know how to use it, or may be worse, but you don't know that when you pick it, or may be slower, but you don't know that when you start using it, or may have vendor lock in, but you don't know that when they sell it to you as "open", or may have hidden pitfalls, but they aren't in the glossy brochure, or may be unmaintained by anyone except your org in ten years, but you can't know that until ten years from now, or may be full of security holes because it was developed by idiots, but you can't know that because you didn't see who wrote it, or might be full of solid security features and a great design cleverly compromised by a hidden flaw placed in a specification you haven't read by a nation state, but you don't see that because why would you, or... etc etc etc. <sarcasm>But don't worry, at least you didn't have to understand the problem space well enough to be able to sit down and solve it yourself, so you sure saved some effort there!</>
Isn't this an argument against using any library at all?
Yes!

Which bring us to the topic of tradeoffs and the synthesis of balance, by way of weighing competing advantages and costs fairly.

On the one hand, code you must write and understand. On the other, code someone else wrote, that you can just use. There is no clear winner here. It's always a tradeoff.

Haha, so enterprise-chat.

I rewrote our entire database layer in Hibernate for our (incredibly complex monolith) webapp. Then I was tasked with rewriting a major core piece of search functionality that builds a query from user selections / saved queries.

I was told that contrary to previous work, Hibernate Criteria would not be allowed, since it was deprecated. Hibernate's official replacement for programmatic queries is JPA Criteria, but Hibernate's support for this was not feature equivalent to Hibernate Criteria, so this was out too.

So what I got the green-light to go on was rewriting my own pseudo-ORM wrapper that generates HQL query strings and parameters. Hql is not deprecated, you see.

It's ended up working out moderately well, it's a thin layer and as long as you avoid the rough edges it actually works fairly well, as well as providing a convenient point to translate query language from the old kodo format into hibernate (cringe, code smell).

There have been times I've had to do some very awkward query shit that I've only managed to lever in via HQL. You have no idea, views on top of views.

No idea what'll happen after I leave, that's their problem!

Thanks for the job security, Hibernate team. Your incredibly-poorly-executed transition from a well-supported standard to the "new new" has been exquisitely great for my job security.

Bet there's Python3 devs who feel the same way!

>JPA Criteria

Feature parity aside, I also found this to perhaps be the most verbose API for query building I've ever used.

Hah... I’m probably falling for Poe’s law here, but anyways... there are certainly cases where in-house is better than external dependency - specifically when your team knows the tech domain better than anyone external can... but in general well-maintained (preferably open source with a community, or a well funded company) external dependencies are almost always better. They usually would have the years of fixing edge cases and features that you would inevitably run into if you were to roll ur own.
They’re also tailoring their solution to be as generic as possible.

Having written OSS and also having written enterprise applications, it seems plainly obvious to me why a homegrown solution is preferred. Code developed internally is understood by the team (you may not understand the underlying implementation of a dependency), and can be tailored exactly to suit your needs (ignoring edge cases that aren’t relevant, removing unneeded features). And you never have to worry about maintainers disappearing, breaking changes being introduced, or bugged releases that you can’t do anything about.

I don’t mean to sound crass but how on earth could you think this is an example of Poe’s law? What’s so extreme about being a responsible developer? I didn’t say “every solution should be developed in house” (though I think most large projects would be better for it!) obviously there’s is a cost associated with in house solutions and you should gauge that cost to see if it’s worth it for your application. But if you’re going to be working with that application for years and years to come then I highly recommend trying to write your own code instead of relying on libraries.

To counter your arguments, I'm going to use a couple of typical examples of when an in-house versus open-source / external debate comes up. I'm not counting the infamous "leftpad" cases, those are usually trivial, and really don't matter in the grand scheme of things. If it's a one-liner, just implement it yourself.

1. A high level database or queue lib, or a custom / powerful serialization lib or, relevant to this topic) an ORM or other foundational/low-level part of your tech stack.

What you can expect to happen is a bunch of very good programmers early on build powerful abstractions using macros, metaprogramming, advanced type system concepts and build up a codebase adding up to a few thousands of lines. It just works, it's a good system - a few bugs are patched by the team every month, but that's fine. Fast forward a few years, the programmers have moved on, "onboarded" the rest of the team to the codebase during their respective last week, but given how complicated the codebase is no one is really capable of debugging it and fixing issues. And given that it's not open-source, it never got an opportunity to build a community of contributors. Your team is now SOL, and it's going to take _months_ to replace it with a more well-maintained open-source solution.

2. Building a A/B testing implementation in-house - again a couple of good programmers build a working, scalable, basic system in a weekend. It actually works and the code is good, simple, readable and well-tested. But then, your PM team or your Marketing team wants you do add graphs. Then export the data to RedShift. And then tweak the algorithms powering the backend. Then multi-arm bandit. And so on. Now, what was now a weekend project, turns into months of work - whereas there exist paid services that do this for you.

Sometimes, it's unavoidable, external alternatives are genuinely not good*. But I strongly think, you have to be very, very careful about building systems in-house when they are not your business.

> I don’t mean to sound crass but how on earth could you think this is an example of Poe’s law.

Sorry for this. I do feel quite strongly against your original comment (at least the way it was written without context), and I think it's the _opposite_ of being a "responsible developer" in all but edge cases, and think you are wrong. But calling it an example of Poe's law was not right on my part, and was harsh.

> But if you’re going to be working with that application for years and years to come then I highly recommend trying to write your own code instead of relying on libraries.

I've done this, and have done both in-house and oss code, but in-house, very reluctantly - for example - when there's just one or two maintainers committing code, and there's no alternative. But even then, I have usually forked the code and used that or parts of that as the base, rather than starting from scratch

There is hardly ever a time where an in house solution is better than a third party one for cross cutting concerns. Most of the packages are open source.
Idunno man, my day job working Rails code uses a custom mailer and job queueing system and everytime I have to work with it I really wish they'd used ActiveMailer and ActiveJob
Like I said, “almost” always. Really the larger the application and the longer time you as a dev will work with it, the more meaningful it becomes to write your own solutions.

It’s really a balance, but I don’t think it’s a balance most devs consider and they really should.

While I agree with you, I'd like to point out there are interesting exceptions: software components that can never be "complete". Such components require permanent maintenance workforce, and you might not want to dedicate resources for this.

Such as:

- API abstraction layers (like SDL, Allegro, SFML, etc.): you want to support new operating systems / new APIs by default. And most of the time, you don't want to spent time learning about the specifics of X11 window creation or Win32 events, as this would be throw-away knowledge anyway.

- hardware abstraction layers: you want to support new hardware by default, this is why we use operating systems and drivers.

- Format/protocols abstraction layers: if your game engine only uses JPEG files directly coming from your in-house asset pipeline, it's perfectly fine to develop in-house loaders (from scratch or from stb_image). But if your picture processing command-line tool aims to support every file format (especially, the ones that don't exist yet), then you should rather go with an updeatable third-party library, which will allow you to get all new formats by default.

- all kind of optimizers, including compilers, code JIT-ters, audio/video encoders, etc. More generally, all code that uses some heuristic so try to solve a problem that's not completely solved/solvable. You might be ready to accept the performance of a specific version of, for example, libjit. But you might instead consider that in your case, not having state-of-the-art JIT performance might be detrimental to your business, in this case you want to get the performance enhancements by default.

Lack of testing, lack of documentation and lack of use would be reasons that your claim is usually untrue. You can't Stack Overflow a problem and see if anyone else has encountered it before.
This is a terrible reply, what are you even trying to say? You can’t stack overflow a problem so don’t write your own in-house solutions? Lack of testing? We write our own tests. We write our own documentation.

It’s crazy to me how many people on HN are ignorant to the costs of third party dependencies and the benefits of in house solutions when building large applications.

I am trying to say that most of the home grown solutions I have seen have been pretty poor quality, and lack documentation especially. Do enough maintenance programing and you will understand.

If you do test and document your own stuff properly you are in a small minority. Why not release it for others to use?

In house means more customized to the specific problem but with far less expertise in the general technology. I find the latter almost always outweighs the former when working at any cost center tech shop.
Otherwise known as NIH in steroids.
Opposite; an in-house solution is almost always worse than an external dependency when that dependency is something as important to get right as an ORM.
The best of both worlds is write your own universal preprocessor...
The problem in many cases is actually in the OO part, in my experience - in the vast majority of cases where databases and persistence is concerned, staying in the procedural/structured + relational world keeps things simple, whereas objects often obscure what is actually happen, and invoke opaque magic such as ORMs.

I wonder what your experiences had been if after dropping the ORM you had gone one step more and dropped the objects.

This.

After contemplating my distaste for ORMs more carefully, I've come to the realisation that my objections aren't so much to do with the concept of an ORM but rather object orientation itself—and the fetish of treating it as the perfect hammer for every nail.

For the projects I've worked on, I've almost never wanted to turn data into objects. And on the occasions when I've thought otherwise, it usually turns out to be a mistake; de-objectifying can often result in simpler, shorter code with fewer data bugs.

Ultimately, the right answer depends on the nature of your particular business logic, how data flows in your wider ecosystem, and pragmatically, the existing skills of your workforce.

Hahaha! ^_^

Seriously Jesse, isn't this the very same reason __some__ people end up implementing yet another programming language without realizing it?

First they start out of exasperation with X language they use, because they hit some obstacles or limitations, and before they know it they end up implementing a newly created language.

You know what's the fun part? In their attempt to fix the aforementioned language's issues, they end up introducing __the very same problems__ in their own language, only under different "cloak" so to speak.

It's a vicious cycle I'm afraid...

So are you suggesting that we can’t do better than assembly machine code?
SQL is great if you will have multiple applications looking at the same dataset. E.g. An employee management program and a payroll program. In this case you should design a sane schema and mold the app around it.

ORMs are terrible in this sort of world since they tightly couple the application to the data. But if you will only ever have one application anyway the abstraction of a separate schema is pointless.

A lot of people who believe only one app (or one language) accesses their org's datastore are mistaken. You have to take extreme measures to prevent ad hoc uses from popping up.
Yes, yes, yes.

Why is this the case?

1. If you are doing anything interesting, people are going to ask questions about what you are doing, and the best way to answer those questions is going to be by querying your database.

2. One day you might want to rewrite some of your service/s, split them into microservice/s, etc. At that point, there will be a minimum of two services talking to your datastore: the legacy service and whatever you're replacing it with. I suspect any alternative to this arrangement will be an even worse idea, e.g. taking a deliberate outage to perform a likely-irreversible migration.

> One day you might want to rewrite some of your service/s, split them into microservice/s, etc. At that point, there will be a minimum of two services talking to your datastore.

You should not do this. It removes almost all of the benefits of extracting things into a separate service (services should own their data and the only means of accessing it should be via their APIs). That's not utopian; that's one of the main reasons you do a service extraction in the first place.

Right, so let’s suppose you already segmented the data to two different backing datastores, and your monolith is now connecting to both of them instead of just the one. Now you can do the service migration, at which point you still run into the situation I’m discussing.
Cutovers are hard, to be sure. Ideally they should also be short (the time time a service undergoing mitosis spends talking to the old and new locations should be measured in days or hours or less).

Don't choose general data access patterns for the infrequent occurrence of cutover. Cutover is when you break a few rules and then immediately stop doing so. Build for everyday access patterns instead (which should be through the API of whatever owns the data--SQL is a powerful language and a really shitty API).

The simple solution to 1 is to never allow direct database access. Api only.
Of course. But surely you don't let anyone access your API, and you put it behind another API, right? Just in case you need to change that first API without breaking all the users.
Never even tell you have one, else the founder will pat on the back of one of your most junior dev and ask if he can give access to the db to that other team who needs to make money :D
So you do all your analytics by running a series of service calls and then writing a script to collate them into the needed results? Seriously?
I'm not the GP, but yes, absolutely. There are plenty of things that make this less than awful:

- The existence of tools that allow structured access to multiple APIs (GraphQL is a nice middle ground between "YOLO any queries you want" and "you only get row-by-row access exposed by the web APIs").

- The existence of data on multiple internal data stores. Analytics folks usually are not prepared to engage with the complexity of data being stored across handfuls or more of different stores with different schemas. The owner of the application knows how to join that stuff better than they do.

- Building intermediate/denormalized stores isn't frowned upon just because analytics shouldn't run ad hoc queries on the main production DBs. Expose change streams or bulk ("too much" data) endpoints and make it easy to load their results into a reporting system, which can be raw SQL. It's not redundant; if you don't do this, the following conversation starts to happen often: Q: "I'm running raw analytics queries on production and it's not quite working, can we just make $substantial_schema_change so my report works/is fast?" A: "No, we explicitly chose not to structure the DB/index/whatever like that because it seriously fucks up a real user access pattern."

What do you mean by this? What other way , other then digging right into the data is there to access the database? Isn't it all through APIs?
I like Clojure’s HugSQL[1] for this reason: you can simply write raw SQL, but when you start duplicating code, you can start factoring those bits out into composable “snippets”. The best of both worlds: composability and reuse, while still writing raw SQL.

[1] https://www.hugsql.org/

I have the opposite view. I find ORMs annoying and obscure, and I think they introduce duplicated code.

If you need to run a certain query in multiple places, you need to repeat the same ORM expression or refractor it into a function. I find much better to have a module with all my SQL queries as strings. That way whenever I need to run a query I reference it from there. Of course it helps to use meaningful names.

This approach has a lot of advantages over ORMs: * you know exactly what gets executed * automatic DRY code * the names of the SQL queries in the code are self-explanatory and the reader doesn't have to parse the ORM expression every time

Schema definitions are in standalone SQL files, as well as my migrations.

The only disadvantage is that it may be difficult to switch to a different database system, but that is not a problem for us.

Your argument does not stand up. I can have a file full of ORM sql fragments the same as you file of strings. And I can compose mine together safely and more flexibly than strings.
Have you tried Python's SQLAlchemy, the ORM parent posts are praising? The `sqlalchemy.sql` module is awesome and pretty much maps 1:1 to raw SQL.

Composing SQL expressions using this library instead of using string interpolation/concatenation has several advantages:

* DRY and composition * safety * portability (if you have switch the underlying DBMS)

Often the result is as good or better than my raw SQL. The fact that Python has an amazing REPL makes the process pretty much like testing queries in the database prompt but with less cognitive switch between languages.

In the end it is a matter of taste, but I have to agree with parent posts, SQLAlchemy raises the bar for other ORMs.

> I find much better to have a module with all my SQL queries as strings.

But you can't compose them, so there is a lot of duplication. Also, how would you handle dynamic filters and columns? Concatenating strings? That seems error prone. At least a nice query builder would be useful, but then the whole just write sql thing falls apart.

Hahaha, wow! That’s just about the most awful thing I’ve ever heard. If you ever find yourself sitting across a table from an interviewer, I would definitely recommend not including this little tidbit in the conversation.
ORMs seem to be a typical example of over-engineering. Often you don't need all that complexity they come with and when you do, you are probably better of understanding exactly what you are doing.

So maybe building a minimal API, wrapping your SQL queries isn't such a bad idea after all.

The things I've found positive about ORMs are exactly that mapping of results to business objects. The things I've found "not worth it" are the query-building APIs baked into the objects. These principles can be seen in a lightweight ORM I made, PureORM [1].

[1] https://github.com/craigmichaelmartin/pure-orm

> and handle mapping data explicitly.

Here's your problem

There is a lot of ancillary complexity in database connection libraries that we could attack before replacing the standard structured query language by some poorly considered mapping of objects to and from relation(s), inspired by poorly understood bad old OOP, which is generally what all ORMs boil down to.
you could consider something like slick onstead of an ORM:

http://slick.lightbend.com/doc/3.2.0/orm-to-slick.html

This is exactly what MyBATIS is for.

You throw in SQL, provide a simple mapper, done. IMHO it's far superior to ORMs when your database is or may become complicated.

It is striking the balance between your own queries and ORM.

My rule of thumb is that I always go with ORMs for MVPs and small apps. Optimizing for speed usually means going deeper and building a system or queries for yourself. Until that point I usually stick to less verbose code and more to business rules.

Views are another way to gain reuse.
As are stored procs, user-defined functions, triggers, scheduled jobs,.. SQL databases are programming environments, not pure datastores.
Most are exceptionally bad at being programming environments, though.
> I started coding some simple functions to help map the tabular data to objects.

Maybe you should not do that? Can you give us an idea of the domain problem you were trying to solve that made you feel the need for that?

I use jdbi.org in Java all the time because it does just that for me.
MyBatis