Hacker News new | ask | show | jobs
by tomrod 2348 days ago
This is a topic that really interests me, but I couldn't read the article -- either a paywall, ad-wall, or some other reader-hostile blocker incongruent with the foundation of the Internet prevents usability. Ah well. I'll join the conversation regardless.

For all the programmers out there -- _how do we do this?_. I came into programming through Matlab and Python in Economics and Data Science. I don't have formal training in software engineering. I know some C, some Fortran, and have a journeryman's understanding of how my tools interact with the hardware they run on.

Where can I learn how to be extremely efficient and treat my operating environment always as resource constrained? Am I correct in seeing the rise of point-and-click cloud configuration hell-sites like AWS are masking the problem by distributing inefficiently? (sorry if unrelated, spent hours debugging Amazon Glue code last night and struck me as related).

In other words -- how can we tell what is the path forward?

4 comments

The days of everything being hand-optimized assembly are behind us. It still has it's niche, but for anything outside hot inner loops or extremely frequently called functions (like malloc), straightforward C++ will be just as fast.

Meaning there's no point in optimizing an expensive function if 99% of your program's memory and run time is spent in a different function.

This means the absolute most important skill to writing efficient software is not assembly language skills, but profiling so you know where to focus your efforts in the first place.

> Meaning there's no point in optimizing an expensive function if 99% of your program's memory and run time is spent in a different function.

Maybe there's no business point in optimizing those. But I feel this line of thinking got us into the current mess to begin with. Everybody is either like "we can't afford to optimize" (blatant lie at least 80% of the time btw) or "nah, not my effing job".

Plus that philosophy only really works when your business is fighting for survival in its initial stages. After you stabilize a little and have some runway you absolutely definitely should invest in technical excellence because it also lends itself pretty well to preventing laggy and/or buggy user experience (and those can bleed your subscriber numbers).

Honestly? I used to jabber on about this with regards to the still distant future of actual nanotechnology ... we need to find the guys who wrote videogames for arcades in the 1980s and press them for their secrets before so many brilliant tricks will be lost to time. They did so much with so little!

My guess is that we will slowly approach this wall and spend a lot of time trying for incremental gains, trying to avoid the inevitable, which would be the design of new chipsets with new instructions, sets of new languages explicitly designed to take advantage of the new hardware, and then tons of advances in compiler theory and technology. On top of it, very tight protocols designed for specific use.

I think we have layers upon layers of inefficiency, each using what was at hand. All reasonable things to do, in the short-term, based on the pressures of business. But in the end of the day we're still transmitting video over HTTP, of all things. Sure, we did it! But you can't tell me that it is efficient or even within the original scope of the protocol's concept.

Naturally, I think the whole thing would run about a trillion dollars and take armies of geniuses, but it would at least be feasible, just ... it would require a lot of will. And money.

The secrets of 1980s video game programmers?

1) hardware that doesn't change. One C64 is just like every other C64 out there. You knew what the hardware was and since it doesn't change, you can start exploiting undefined behavior because it will just work [1].

2) The problem domain doesn't change---once a program is done, it's done and any bugs left in are usually not fixed [2]. The problem domain was fixed becuase:

3) The software was limited in scope. When you only have 64K RAM (at best---a lot of machines had less, 48K, 32K, 16K were common sizes) you couldn't have complex software and a lot of what we take for granted these days wasn't possible. A program like Rouge, which originally ran on minicomputers (with way more resources than the 8-bit computers of the 1980s) was still simple compared to what is possible today (it eventually became Net Hack, which wouldn't even run on the minicomputers of the 1980s, and it's still a text based game).

4) The entire program is nothing but optimizations, which make the resulting source code both hard to follow and reuse. There are techniques that no longer make sense (embedding instructions inside instructions to save a byte) or can make the code slower (self-modifying code causing the instruction cache to be flushed) and make it hard to debug.

5) Almost forgot---you're writing everything in assembly. It's not hard, just tedious. That's because at the time, compilers weren't good enough on 8-bit computers, and depending upon the CPU, a high level language might not even be a good match (thinking of C on the 6502---horrible idea).

[1] Of course, except when it doesn't. A game that hits the C64 hard on a PAL based machine may not work properly on a NTSC based machine because the timing is different.

[2] Bug fixes for video games starting happening in the 1990s with the rise of PC gaming. Of course, PCs didn't have fixed hardware.

EDIT: Add point #5.

I have this theory that a lot of corporations know this but they don't want to be the pioneers who volunteer their money and man-hours, only for their competitors to then reap the fruits of their labour for free.

I can't prove it but I intuitively feel there's a lot of spite out there. Many people are unhappy with the status quo but are also unhappy with the idea to sacrifice their resources for everybody else -- and they will likely not only be non-grateful; they might try and pull an Oracle or Amazon and sue the creators over the rights of their own labour.

Things really do seem stuck in this giant tug of war game lately.

The path forward is to be economical with hardware resources. I always try to imagine a physical character performing a task that i'm trying to code. How far does imaginary character needs to travel, how many trips do they need to make. Is everything they do is absolutely necessary. If they delegate work, is their sub-contractor efficient?

There isn't a single place to learn how to be efficient, it is better to start being extremely curious of how things actually work. Scary number of people I've met do not even attempt to learn how a library functions they use actually work.

Not sure what you are getting at here.

> I always try to imagine a physical character performing a task that i'm trying to code. How far does imaginary character needs to travel, how many trips do they need to make.

Dude, that's why we have optimising compilers. Functional programming is demonstrably less efficient on our imperative/mutable CPU architectures but a lot of compilers are extremely smart and turn those higher-level FP languages into very decently efficient machine code that's not much worse than what GCC for C++ produces. Especially compilers like those of OCaml and Haskell are famous for this. They shrunk the gap between FP and the languages that are closer to the metal. They shrunk that gap by a lot and even if they are not 100% there, I'm seeing results that make me think they are 75% - 85% there.

We need languages that rid us of endlessly thinking about minutiae and we must start assembling bigger LEGO constructs in our heads if we want anything in IT to actually get unstuck and start progressing again. (Of course, this paragraph doesn't apply to kernel and driver authors. They have to micro-optimise everything they can on the lowest level they can. That's a given.)

> Scary number of people I've met do not even attempt to learn how a library functions they use actually work.

I couldn't care less. How a library function works is an implementation detail. I only need to know what does it do. That's why it's a 3rd party library after all. The creator might notice a hot path during stress tests and optimise that implementation detail into some entirely another algorithm and/or data structure. And boom, your code that optimises for an implementation quirk you weren't supposed to look at in the first place, is now slow or even buggy.

The fundamental tradeoff is between control and abstraction. Better control typically means going closer to machine/operational semantics, better abstraction typically means going to denotational semantics.

Compilers are what mediate between these two domains, but tend to become more bloated as they have to accommodate both more diverse hardware and more numerous languages.

This helps the working programmer ignore the problem of writing good code but only for so long. It only delays the inevitable as the returns from clever compilation can't go on forever, and in fact these returns become more volatile as hardware architectures become more complex (typically through more cores or extra caches, incurring synchronization costs). Thus for maximum performance through binaries one would have to practice tweaking compiler settings which just creates another layer of abstraction and defeats the point of having this step automated for you.

Programmer training in particular needs to become both more comprehensive and more specialized. More comprehensive means knowing how each layer of abstraction gets built up from the most common machines (like x86). More specialized means filtering out a lot of people who were trained-for-the-tool and facilitating more cross collaboration between those that can program in a domain but not program for performance. This might mean better methodologies for prototyping across domains or experimentation with organizational structures to complement such methodologies.

Functional algebraic programming as a paradigm still seems somewhat underrated to me as a way of cross-cutting conceptual boundaries and getting programmers refocused on how their code is interpreted from the point of denotation. But it comes at great risk from continuing the trend towards more redundant abstraction which is responsible for bloatware.

At that point it seems that knowing how these problems are solved without classes types and libraries, or at least how classes types and libraries resolve the complexities of just doing it using the native capabilities of the operating environment (and recursing down to the point of maximal control), might be a big improvement, as it means reversing the greater-abstraction trend.

Under these discretions languages like OCaml and Rust seem to make the cut. A lot of good ideas from these languages seem to seep into the design of others. But the white whale is browser programming/web programming, as the browser has become the de facto endpoint for universal application deployment. WASM may or may not fix this. But then we just get to compilers again.

This talk did the most for developing my point of view here: https://www.youtube.com/watch?v=443UNeGrFoM Choice quotes include "If you're going to program, really program, and learn to implement everything yourself" and "At first you want ease, but in the end, all you'll want is control."

Or just take up another field. We probably need more farmers and doctors than programmers now.

> Under these discretions languages like OCaml and Rust seem to make the cut.

I absolutely agree! I am gradually learning both and I am just getting so angry that I didn't know about OCaml like 10-15 years ago. :( I was just so damn busy surviving and being depressed for a heckton of [dumb] reasons for 15 years. And then I woke up.

Now I am just a regular web CRUD idiot dev who, even though he was very clever and inventive and creative in the past, nowadays seems to get pissed at small details like configuring web frameworks (even though I am still much better than a lot of others, I dare say -- proven with practice... or so I like to think). And now I have to work against the negative inertia of my last 15 years and learn the truly valuable concepts and how they are implemented in those two extremely efficient, if a bit quirky in syntax, languages.

But it seems every time somebody says "let's just keep these N languages and kill everything else", no discussion is possible... And I feel we really must only keep a few languages/runtimes around and scrap everything else.