Hacker News new | ask | show | jobs
by jbandela1 1053 days ago
> Yes, I had failed to see the proper solution: a class generator — so that I didn't have to manually copy code again

No, please don't. This is jumping from the frying pan into the fire. If you think abstract base classes can be clever and hard to understand, code generators can be even more so. In addition, because code generators are a one way conversion, and the generated code evolves independently and the code generator itself is evolving independently, you end up with a bunch of things that are subtly different yet somewhat related in a way that you will need an advanced degree in evolutionary biology to understand.

18 comments

As a blog post with the title "Don't be clever" ending up with the author's answer being an even clever-er solution, than talking about the hidden fight against complexity in software, was an odd ending note.
"Don't be clever, now let me tell you about the non-clever solution of writing a bespoke meta programming language to solve the problem of copy/paste"
And with this inconsistency the author really pushed you into rejecting complexity.

Maybe by looking like a fool he fooled us all?

I think it’s just a great metaphor, and an experience I seem to relive over and over. Every time I think I have something figured out, hindsight always shows me I didn’t. This probably leads to me living in a state of perpetual overconfidence. Because even if I don’t fully know what I’m doing, look at past me! That dude had no clue what he didn’t know!
To me the most ironic part was calling CRUDController clever in the first place. There could have been clever solutions to reduce the code repetition, but using inheritance to force a naive abstraction on every future programmer isn't it.
I don't know if it's because I'm reading HN more often these days instead of spacing it out but there's definitely a sizeable quantity of "<hard statement title>" followed by "so short it seems from twitter" article which really provides little insight or is absolutely misguided.

I guess the debate is instructional in itself and it's why the comments+article combo is the real power of HN.

It's not a clever-er solution? As an example, `php artisan make:controller` and potentially some custom derivatives is going to solve like 95% of what he was trying to abstract.
Until something goes wrong, and you didn't write it. At that point, have fun.
I didn't find either all that clever. It was premature abstraction, followed by abstraction-lite.
My understanding is that in this case, the code generator is merely a boilerplate generator that isn’t meant to keep the code in sync, but just do the initial copy/paste work.

I think code generators are a perfectly acceptable solution in cases like this, when the starting point all looks the same, and it needs to diverge from there. Especially if the generation logic is fairly straightforward.

There are a lot of IDEs that have boilerplate generators out of the box.

This is sometimes also called scaffolding, which I think is a better term. Code generation often means (compile-time or otherwise) generation of code from something else (like .proto definitions). Code that is not supposed to be modified by the developer and will be overwritten automatically.
I've also seen "scaffolding" used to generate code that shouldn't be manually modified.

E.g.: https://learn.microsoft.com/en-us/ef/core/managing-schemas/s...

That's someone who doesn't know what words mean.

When a building is built, the scaffolding isn't an immutable part of the final product.

Well if we're being anal about the metaphor, in construction and renovation, the scaffolding eventually goes away. Scaffolded code rarely disappears entirely; some of it usually sticks around.
To be perhaps even more anal, after construction scaffolding is taken down, stored, transported to a new project and then used again.
Then that makes it an inappropriate word for both uses.
It’s a little ambiguous with Entity Framework. There is no rule that says you cannot change the resulting code. If you want to continue with Code First, that is a totally valid approach, there’s even a subsection on that.
If scaffolding is generating more than an SMS worth of code per file, it will drive you crazy because you still want to keep it in sync.
Correct.

The number of comments which latch onto "code generator" without understanding it are disappointing.

Yes, this is what the author meant, no doubt.

To be clear I'm not being sarcastic. Many frameworks have code generators that writes boilerplate for you.

This is also something the Rust language does well with macros. It’s a pretty standard approach.
So much about programming in a larger sense is just getting abstractions correct. Too loose and they don't standardize/remove enough boilerplate. Too strict and they break or multiply when changes are needed.

I am curious, since my own backend experience is limited (obviously there will be many opinions on this) but it seems to me like his mistake was using the inheritance of classes.

If he had simply had a CRUD layer - that only standardized the actions of creating, removing, updating, or deleting a record (or records) from a database. This would accomplish standardizing the payload, as well as possibly doing an RBAC/Authorization check.

He could have just written individual controllers that invoke that layer with custom business logic. He would even have more space to add comments indicating why the business logic is implemented this way. When someone needed to make a change, they could just go straight to the specific controller.

Putting the impracticality of having to override specific methods aside, isn't it also a waste to instantiate a class full of methods that essentially do the same thing when you can just point to existing ones? Well I guess it is pointing to existing ones via inheritance, but it seems harder to reason about. But this is coming from someone who doesn't write traditional object oriented code very often.

Genuinely curious to hear from backend people about this, it's not my area of expertise but I've learned a lot this year.

In my view, there's duplication that could use an abstraction and duplication that is merely coincidental. (See The Wrong Abstraction by Sandi Metz [1])

Unless something screams "this should be the single source of truth about this" forget abstracting all together and just copy and move on.

The problem with trying to create 1 "CRUD controller" is that there are always going to be hairy things that make parts one offs. Perhaps someone needs to add location headers because the underlying calculation takes too long. Perhaps they want custom status codes when things go wrong. Maybe they need to use web sockets or server sent events. As soon as any sort of needed customization comes into play you start finding yourself closer and closer to the framework you are likely using until you reach a point of "Why am I trying to wrap the entire framework? Why can't I use it directly?"

And if you've made the mistake of pulling that abstraction into a library, heaven help you when you need to update things. What happens if the underlying framework library makes a breaking change? Or if you need to make a breaking change to support some feature? It all gets really messy really fast and now instead of just impacting the 1 application you are impacting 100.

Updating shared code is never as easy as you might think.

But, on the flip side, I can't think of anything easier, even if it's mostly boiler plate, than writing a controller that calls some business logic that works with a DB. Regardless the language or framework. The hard part of such applications is always the business logic and not the actual controller wiring.

[1] https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction

Thank you for that link, I've never heard that but it's definitely going to be one of the new things I meditate on quite a bit. Especially combined with the concept of all "all abstractions are leaky"

If you're designing a backend service that has multiple "heads" - say, a web application and a mobile application. Then it makes sense that service should be code that manipulates the database(s) via business logic, and at the very least should be in one place.

But broadly, programmers do see value in abstracting interacting with the DB at least somewhat. It's why ORMs exist.

I don't know if there's a real answer. An abstraction can be right for a while and then become wrong when you add a new requirement right? So is it pointless to use abstractions at all? That definitely feels like the wrong takeaway. I guess instead, it's don't immediately abstract out anything that feels abstractable when writing new code and don't feel obliged to use existing abstractions in a codebase until you're sure they fit what you're doing.

> I don't know if there's a real answer. An abstraction can be right for a while and then become wrong when you add a new requirement right? So is it pointless to use abstractions at all? That definitely feels like the wrong takeaway. I guess instead, it's don't immediately abstract out anything that feels abstractable when writing new code and don't feel obliged to use existing abstractions in a codebase until you're sure they fit what you're doing.

The real answer is that, as much as some would like it to be otherwise, there aren't hard and fast rules in programming. Determining when it makes sense to abstract and when it doesn't is ultimately something that will be guided by experience.

That said, DRY is dangerous. It's to easy to blindly follow and has disastrous effects when the wrong abstractions get made. It's far better to duplicate first and DRY when it becomes a pain.

Generating code is fine, if the generated code strictly never evolves independently of what it is generated from. For instance generating libraries from .proto files (or other declarative schema definition solutions) works really well. If the schema changes, you throw away the old generated code and generate brand new code, no problem.

But if you want to make even a single tiny modification to one of the generated files, you're busted, you need a different solution.

Generated code is fine if it's newly generated on every build. If you're going to have to maintain the generated code, it's not generated code anymore, but duplicated code.
Sure. I consider this a restatement of what I said, and thus inarguably right :)
Seconded. How many in this thread have found generated code in source control? My trophy case includes artifacts produced by: flex, bison, gperf, swig, and one particularly nasty CORBA stub generator.
No Perl?
Yes and the original article is about how duplicated code is ok. The discussion finally went the full circle.
The original article isn't very convincing though. I mean, I fully believe the single abstract super controller was a bad idea, but there are far better options than that and duplicate code. He's just comparing two of the worst ways to do it.
> But if you want to make even a single tiny modification to one of the generated files, you're busted, you need a different solution.

Not totally true, if you can robustly express your tiny change as a `sed` or `awk` script, you can just append to the generator pipeline. Speaking from experience, do not condone, etc.

I think GP means "make a tiny change [after generation, outside of the generator, and persist that change independent of the generator code]", which is where all the demons are waiting

Modifying the generator itself to do something different every time, and doing GP's stated "regenerate and throw away the old stuff" is in line

It's not modifying the generator. The generator may be a proprietary black box. It's wrapping the generator in a bash script that pipes the result through AWK, etc.
As other commenters have noted, if the awk script is just a pure function of the output of the black-box generator to a new output, then I would consider this a modification to the generator, and no problemo.

However, if your awk script requires the current state of the generated code as input in addition to the output of the black-box generator, and tries to reconcile a diff between the two things, then yep, I consider that busted.

> It's wrapping the generator in a bash script that pipes the result through AWK, etc.

Which is itself a generator

Sure, that's orthogonal. If you wrap the generator in your build system and still always regenerate, it's effectively the same. And also, I think, not what GP was talking about
Pedantic. There's a world of difference between grokking a new code generation DSL+codebase and a shell one-liner that fixes a string that is obviously invalid.

Since the issue is the maintenance of such systems, it is absolutely relevant.

No thank you! I don't enjoy fighting dragons :)
> For instance generating libraries from .proto files (or other declarative schema definition solutions) works really well.

...does it ? Generated ones always feel being mismatched with the language paradigms. Maybe that's just my nightmares of dealing with MS Graph generated vomit hose of a library...

Sure, that's true, I'm a heavy user of the standard protobuf library in python, and you certainly won't catch me singing its praises for its style.

But that's a different (and less important) kind of problem. It does not exhibit the huge issue with generated-and-then-modified code where you have to maintain all the generated code rather than just the source from which it was generated.

It's trading wasting time by few developers manually writing client, for wasting time of tens of thousands of developers that use said client that doesn't fit language well.

It is IMO very bad tradeoff.

As a Lisp guy I find this entire discussion weird
Ha, yeah, though I would say that the lisp solution has a different downside: it's really nice to be able to see what the post-generation code all looks like. None of lisps I've used have made that as easy for their macro expansions as I would like.
There was an editor (for cmucl maybe?) that would macroexpand in a tooltip on hover and macroexpand-1 on right click (or maybe the opposite) on an s-expression. I'm surprised something like that didn't make it into slime, though you can I think macroexpand to the minibuffer. But, yeah, that's why it rewards doing macros in small pieces.
I absolutely prefer code generation over macros. It is a general solution that works for all languages, databases, protocols etc. And you can easily inspect the code generated.
You can wrap a code generator by a macro.
Why would you want to do that? That would be adding unnecessary compile time overhead. And (again) code generation works for any language/framework/OS/… Not just for Lisp.
You can handle any language with a read-time parser, then work with ASTs, pretty-print the result in another language. In between, it's just Lisp.
The funny thing is the last thing I did code generation for just generated classes which inherited abstract base classes. Then we had to inherit the generated code to extend it. It massively increased the complexity.

Then someone came along and invented partial classes in C# which made this problem go away. Well it would if anyone wanted to do all the maintenance legwork which they didn't so half of it's a 5 class inheritance tree, some of it's partial classes and someone got really fed up with this shit and just arbitrarily stuffed Dapper in there one afternoon.

I think one lesson newer programmers fail is that sometimes copy/paste is not only OK, but preferred.

One time this is good is when helping others get up to speed. Other people can copy/paste your examples and be functioning right now.

But if you have a function that you put in to eliminate some duplication, they have to understand this new abstraction layer before they can start working.

Better to have no abstraction than the wrong abstraction!
To paraphrase that old joke "I had a problem and I used a class generator, now I have a problemFactory".
One good thing with a code generation approach is that we can actually inspect into the generated code and see what's going to happen in a relatively easier way. And it's usually working nicely with static analysis. For abstract base class or whatever, it's kind of hard to do.

Of course, code generator (or even compiler) is harder to maintain, especially when you want the output to be human readable. So it's always about trade-off. Think about how many lines of code it's going to generate. If it's an order of 100k then it's usually worth the cost. 10k might be good to go. 1k, probably not.

Code generators can work if you .gitignore the target folders, so everything generated must be used as-is or augmented by a separate handwritten file like a subclass. And to integrate them tightly into your build process so the generator tool runs transparently.

But even then they're a nightmare because your errors don't match your source files.

"generate and edit" is suicide. "generate and treat the outputs as intermediate compilation objects" is merely an incredibly painful tool of last resort.

Yeah surely a better approach is... rather than inheriting from an all powerful class, just compose controllers out of modules. Some used in almost every controller. Some pretty specific to one or two edge cases.

Like the problem was having an all-or-nothing inheritance model for shared functionality, which also ended up very long and intimidating.

I also don't really understand why emerging edge cases caused him to complicate his 90%-perfect parent class instead of just... overriding specific functions in the children. (The specifics of these overrides will probably make you want to refactor your parent class so that the overrides have clean joints but you shouldn't just dump the new child-specific logic into your parent)

IMHO, I don’t think there are enough code generators, however, a good code generator’s output should be a library, not editable source. I’ve had a lot of good experience using IDLs that could generate libraries for multiple languages that other code could be used with (e.g. define your REST url scheme and supported operations and a base class that gets generated that allows filling in logic).

At least in Java world, annotation seemed to have taken over, but the mixture of code and interface definition kicks you into the language and allows too much flexibility that really make it no different than writing registration code. Having a nice declarative language without business logic sneaking in.

This whole article is about whether to do or not, and I dig the lesson here. The author built on monstrosity they hate, and they propose perhaps creating another. How bold & brave!

Of course lots and lots of people love telling us why not to do things. This is the cosmic-brain point of Steve Yegge's "Notes from the Mystery Machine Bus", of what we let guide us. Is it trying and hope and doing? When and where do we allow reservation in? https://gist.github.com/cornchz/3313150

I use code generators all the time and it has saved me a huge amount of work. I routinely generate 80% of the code needed to implement typical business applications.

Having said that, code generators are not all the same. Some are awesome, others are downright awful. There is an art to writing good code generators. Not surprising really. The same can be said for most other software.

Is that what he's talking about? I thought he was talking about some pattern using first class classes rather than ever programmatically touching source?
What's wrong with macros? I use them every day. They're fine.
"Write code that writes code" -- Tip 29, The Pragmatic Programmer[1]

I think we need to be a little more nuanced, here.

Don't be clever in your code generation, to be sure. But let's not act like you need advanced degrees for all meta-programming.

What do you think phased compilers do? They generally transform the AST produced by your code into expanded versions or rewrites of that AST that are more amenable to machine interpretation or transformation into assembly.

In a very real sense, nearly all production code goes through some kind of codegen process. It isn't clever, it isn't new, it generally isn't complicated (if this then that else that else ... when you get down to it). It is repeatable and reproducible and reliable enough method that the whole economic system of software is built on top of it. Most of the problems come with recursive applications of codegen and strange interactions between different subsystems of codegen.

Here's a tip: Don't do recursive codegen in your user land metaprogramming.

User code is _easier_, simply because it isn't as reusable as the language primitives themselves. But if you understand a problem well enough to essentially reduce it to copy, paste, and replace one or two items, codegen is superior because a bug fix during maintenance need only happen in a single place.

My primary language recently added generic derivation to the language itself (as opposed to in macro libraries) and it is totally worth it as I almost never have to write a ser/de by hand. I never have to test the ser/de round trip. I don't have to pepper my code with second-class annotation primitives or keywords. I never misspell `asJson`, etc.

Not everything should be done with codegen in user land code (code outside the compiler and language tooling). But if I see thousands of lines of json/avro/protobuf/test example copypasta, that's a code smell that needs to be addressed or it will lead to production bugs and take hours instead of minutes to fix.

And when you write these things, you come to find out that they are just like any other data transformation task. They turn `A`s into `B`s, systematically.

My mentor, when I became a professional coder, taught me that if you do something more than three times, its probably worth taking the time to generalize it and do it once before you push the changes into the version repository. It's been 18 years now since I began my career, and that simple principle has rarely steered me wrong across probably a dozen languages that I've written production code in.

1. https://www.goodreads.com/book/show/4099.The_Pragmatic_Progr...

have you heard about Jhipster ? it's a code generator "written" in javacript. It generates Java/Spring for the backend and Angular for the front end
I've read that paragraph with a /s at the end of it.