Hacker News new | ask | show | jobs
by vbezhenar 2457 days ago
Sometimes you just should live with duplicated code. It's OK.
5 comments

No, it is not okay. If you need insert/update/select for every object/table that is way too much duplication. It becomes very irritative when the schema changes. There should be table meta data in such a case but one should also know what one is doing. Having no idea that under the water 14 joins are done is not a good situation either.
Well sort of. In my view, duplicate SQL chunks that have defined business logic should either be a new table/view or extremely well-documented with really rigid communication policies for changes.

For example, a company with many data analysts/scientists who may each be writing their own queries. As a basic example, the definition of some “very important” company metric changes, then there would need to be a large number of disperse queries to change.

But an ORM isn’t the answer for the above situation either.

It's relative, if the duplication is that large maybe you do need to abstract.

It also sounds like you would be well served using a service abstraction at that point to remove the data layer from client scope entirely.

The "model changes, now we have to change it every where" isn't going to be solved by abstraction, it's only limited by the amount you're willing to limit access to the underlying model, if you need that information, you need to share the model.

The best solution to this I've seen in practice is domain modelling, colocating shared code near other users. When things get too distant you start using anti corruption layers which allows more flexible model changing.

But at the end of the day this is essential complexity, orm, or any other solution is never going to be able to hide the fact that you need information elsewhere in the system to be useful.

Thank you! You just said something I very much agree to but never dare to say out loud.
I think that the way to say it without starting a war is to preface it with something like:

> Well, redundancy and dependency both have downsides, but in this case...

But you shouldn’t live with a hand-rolled pseudo ORM that stumbled into existence when there’s developed alternatives
Why not?

I've used hand rolled pseudo ORMs before.

I prefer just plain SQL but for the application I had there was a common access pattern that was worth abstracting out in a DRY sense.

That doesn't mean I want or need a complete ORM. Just a consistent access at certain table types.

I’ve never seen a homegrown ORM that was better than a third party one. Whenever there is an issue - and there are always issues - you have to dig into the code, because they are never documented well.

There is usually a feature that no one thought about and then you have to make modifications to the custom ORM and you get an even bigger mess.

Better is a subjective term.

I wouldn't say what I built was a better ORM. But I would 100% say it's a better solution to the problem I faced.

It didn't get in the way of writing SQL, but it did reduce the boilerplate and repetitive gruntwork.

"Issues" - no, it was a function call deep and incredibly clear and close to what was happening.

sort of agree but also: the good third party ORMs started life as homegrown ORMs.
Usually third party stuff has documentation. Home grown stuff often hasn't. And usually better tested.
Entity Framework didn’t.....
You just said "you know, if you wanted to have a shitty burger, you can get it right there for half the price of a national chain and it will be at least as good"
> I've used hand rolled pseudo ORMs before.

Is your pseudo ORM as well documented as a commonly used ORM? Can I google (or search your wiki) for common issues?

There's a middle ground.

Micro ORMs.

A micro ORM is just an ORM, written well and modularly. It isn't a middle ground - it's choosing to use a well written library.

A lot of people conflate ORM's leaking because of poor designed library with ORM being a bad abstraction in general.

One of the most popular Micro ORMs for C# is Dapper which is used by Stack Overflow.

There is no real abstraction. You write standard SQL and it maps your recordset to an object. You know exactly what code is running.

There are extensions that will take a POCO object and create an insert statement and I believe updates, but where ORMs usually get obtuse and do magic are Selects. It’s hard to generate a suboptimal Insert or Update.

So.. pattern I see emerging. Use orm for the common stuff and execute sql for complicated queries (like reports)
Ruby on Rails’ ActiveRecord, for all its heft, is excellent at this. You can use raw SQL any time you like. It was an explicit design goal from day 1.

There are times I dislike things about it, and it can be quite heavy, but it’s very easy to mix and match ActiveRecord ORM code and raw SQL even within a single model class.

That’s how I’ve done it on my last two projects. We used TypeORM for the standard repeated simple queries, and then wrote custom SQL for our complicated queries that the ORM failed at and then just executed them with the ORM. It was really nice and made for easier table refactors because we didn’t have to go through and audit every query that was calling that table.
Actually, DDD recommends that domain object in general should be loaded by id only, and complicated query for grid should load projection
And for ETL and data loads. An ORM isn’t going to usually do a multi insert statement or an update that involves more than one table.
The informal definition I have of a micro ORM is an ORM without an identity map and without lazy fetching through proxy properties. Are there any more concrete definitions?
> There's a middle ground.

> Micro ORMs.

And there's the mystical fourth option of simply not bothering with objects in the first place. No objects, no need to do object-relational mapping.

So then you’re mapping to whatever data structure your app uses instead of objects. In OOP languages like Python, everything is some type of object anyway.
That’s just a semantic game. If your language returns the result of a query as a generic array of generic dictionaries (or whatever), that isn’t mapping, nor is it object oriented in principle.
But then your generic array of generic dictionaries needs to be mapped to whatever data structures make sense for your application. ORMs save you that step.
also implicit row mappers so one can still do the queries manually without writing all the oo glue
An in house solution is almost always better than an external dependency
This is correct. An in-house solution is a solution developed in-house for your specific problem, which no one else has ever had exactly. The more specific the need, the more the benefit of the made-to-measure solution. The alternatives are something your organization didn't develop, which may be better, but you don't know how to use it, or may be worse, but you don't know that when you pick it, or may be slower, but you don't know that when you start using it, or may have vendor lock in, but you don't know that when they sell it to you as "open", or may have hidden pitfalls, but they aren't in the glossy brochure, or may be unmaintained by anyone except your org in ten years, but you can't know that until ten years from now, or may be full of security holes because it was developed by idiots, but you can't know that because you didn't see who wrote it, or might be full of solid security features and a great design cleverly compromised by a hidden flaw placed in a specification you haven't read by a nation state, but you don't see that because why would you, or... etc etc etc. <sarcasm>But don't worry, at least you didn't have to understand the problem space well enough to be able to sit down and solve it yourself, so you sure saved some effort there!</>
Isn't this an argument against using any library at all?
Yes!

Which bring us to the topic of tradeoffs and the synthesis of balance, by way of weighing competing advantages and costs fairly.

On the one hand, code you must write and understand. On the other, code someone else wrote, that you can just use. There is no clear winner here. It's always a tradeoff.

Haha, so enterprise-chat.

I rewrote our entire database layer in Hibernate for our (incredibly complex monolith) webapp. Then I was tasked with rewriting a major core piece of search functionality that builds a query from user selections / saved queries.

I was told that contrary to previous work, Hibernate Criteria would not be allowed, since it was deprecated. Hibernate's official replacement for programmatic queries is JPA Criteria, but Hibernate's support for this was not feature equivalent to Hibernate Criteria, so this was out too.

So what I got the green-light to go on was rewriting my own pseudo-ORM wrapper that generates HQL query strings and parameters. Hql is not deprecated, you see.

It's ended up working out moderately well, it's a thin layer and as long as you avoid the rough edges it actually works fairly well, as well as providing a convenient point to translate query language from the old kodo format into hibernate (cringe, code smell).

There have been times I've had to do some very awkward query shit that I've only managed to lever in via HQL. You have no idea, views on top of views.

No idea what'll happen after I leave, that's their problem!

Thanks for the job security, Hibernate team. Your incredibly-poorly-executed transition from a well-supported standard to the "new new" has been exquisitely great for my job security.

Bet there's Python3 devs who feel the same way!

>JPA Criteria

Feature parity aside, I also found this to perhaps be the most verbose API for query building I've ever used.

Hah... I’m probably falling for Poe’s law here, but anyways... there are certainly cases where in-house is better than external dependency - specifically when your team knows the tech domain better than anyone external can... but in general well-maintained (preferably open source with a community, or a well funded company) external dependencies are almost always better. They usually would have the years of fixing edge cases and features that you would inevitably run into if you were to roll ur own.
They’re also tailoring their solution to be as generic as possible.

Having written OSS and also having written enterprise applications, it seems plainly obvious to me why a homegrown solution is preferred. Code developed internally is understood by the team (you may not understand the underlying implementation of a dependency), and can be tailored exactly to suit your needs (ignoring edge cases that aren’t relevant, removing unneeded features). And you never have to worry about maintainers disappearing, breaking changes being introduced, or bugged releases that you can’t do anything about.

I don’t mean to sound crass but how on earth could you think this is an example of Poe’s law? What’s so extreme about being a responsible developer? I didn’t say “every solution should be developed in house” (though I think most large projects would be better for it!) obviously there’s is a cost associated with in house solutions and you should gauge that cost to see if it’s worth it for your application. But if you’re going to be working with that application for years and years to come then I highly recommend trying to write your own code instead of relying on libraries.

To counter your arguments, I'm going to use a couple of typical examples of when an in-house versus open-source / external debate comes up. I'm not counting the infamous "leftpad" cases, those are usually trivial, and really don't matter in the grand scheme of things. If it's a one-liner, just implement it yourself.

1. A high level database or queue lib, or a custom / powerful serialization lib or, relevant to this topic) an ORM or other foundational/low-level part of your tech stack.

What you can expect to happen is a bunch of very good programmers early on build powerful abstractions using macros, metaprogramming, advanced type system concepts and build up a codebase adding up to a few thousands of lines. It just works, it's a good system - a few bugs are patched by the team every month, but that's fine. Fast forward a few years, the programmers have moved on, "onboarded" the rest of the team to the codebase during their respective last week, but given how complicated the codebase is no one is really capable of debugging it and fixing issues. And given that it's not open-source, it never got an opportunity to build a community of contributors. Your team is now SOL, and it's going to take _months_ to replace it with a more well-maintained open-source solution.

2. Building a A/B testing implementation in-house - again a couple of good programmers build a working, scalable, basic system in a weekend. It actually works and the code is good, simple, readable and well-tested. But then, your PM team or your Marketing team wants you do add graphs. Then export the data to RedShift. And then tweak the algorithms powering the backend. Then multi-arm bandit. And so on. Now, what was now a weekend project, turns into months of work - whereas there exist paid services that do this for you.

Sometimes, it's unavoidable, external alternatives are genuinely not good*. But I strongly think, you have to be very, very careful about building systems in-house when they are not your business.

> I don’t mean to sound crass but how on earth could you think this is an example of Poe’s law.

Sorry for this. I do feel quite strongly against your original comment (at least the way it was written without context), and I think it's the _opposite_ of being a "responsible developer" in all but edge cases, and think you are wrong. But calling it an example of Poe's law was not right on my part, and was harsh.

> But if you’re going to be working with that application for years and years to come then I highly recommend trying to write your own code instead of relying on libraries.

I've done this, and have done both in-house and oss code, but in-house, very reluctantly - for example - when there's just one or two maintainers committing code, and there's no alternative. But even then, I have usually forked the code and used that or parts of that as the base, rather than starting from scratch

There is hardly ever a time where an in house solution is better than a third party one for cross cutting concerns. Most of the packages are open source.
Idunno man, my day job working Rails code uses a custom mailer and job queueing system and everytime I have to work with it I really wish they'd used ActiveMailer and ActiveJob
Like I said, “almost” always. Really the larger the application and the longer time you as a dev will work with it, the more meaningful it becomes to write your own solutions.

It’s really a balance, but I don’t think it’s a balance most devs consider and they really should.

While I agree with you, I'd like to point out there are interesting exceptions: software components that can never be "complete". Such components require permanent maintenance workforce, and you might not want to dedicate resources for this.

Such as:

- API abstraction layers (like SDL, Allegro, SFML, etc.): you want to support new operating systems / new APIs by default. And most of the time, you don't want to spent time learning about the specifics of X11 window creation or Win32 events, as this would be throw-away knowledge anyway.

- hardware abstraction layers: you want to support new hardware by default, this is why we use operating systems and drivers.

- Format/protocols abstraction layers: if your game engine only uses JPEG files directly coming from your in-house asset pipeline, it's perfectly fine to develop in-house loaders (from scratch or from stb_image). But if your picture processing command-line tool aims to support every file format (especially, the ones that don't exist yet), then you should rather go with an updeatable third-party library, which will allow you to get all new formats by default.

- all kind of optimizers, including compilers, code JIT-ters, audio/video encoders, etc. More generally, all code that uses some heuristic so try to solve a problem that's not completely solved/solvable. You might be ready to accept the performance of a specific version of, for example, libjit. But you might instead consider that in your case, not having state-of-the-art JIT performance might be detrimental to your business, in this case you want to get the performance enhancements by default.

Lack of testing, lack of documentation and lack of use would be reasons that your claim is usually untrue. You can't Stack Overflow a problem and see if anyone else has encountered it before.
This is a terrible reply, what are you even trying to say? You can’t stack overflow a problem so don’t write your own in-house solutions? Lack of testing? We write our own tests. We write our own documentation.

It’s crazy to me how many people on HN are ignorant to the costs of third party dependencies and the benefits of in house solutions when building large applications.

I am trying to say that most of the home grown solutions I have seen have been pretty poor quality, and lack documentation especially. Do enough maintenance programing and you will understand.

If you do test and document your own stuff properly you are in a small minority. Why not release it for others to use?

In house means more customized to the specific problem but with far less expertise in the general technology. I find the latter almost always outweighs the former when working at any cost center tech shop.
Otherwise known as NIH in steroids.
Opposite; an in-house solution is almost always worse than an external dependency when that dependency is something as important to get right as an ORM.
The best of both worlds is write your own universal preprocessor...