Hacker News new | ask | show | jobs
by buserror 2369 days ago
I'm rarely keen on posting negatives on articles that clearly took a lot of time to make, but I think this requires a bit of correction.

I think this article is very, very simplistic. All of it relates to a 8 bits CPU that is 40+ years old.

I switched to HLL as soon as I could get my hand on a compiler, namely, UCSD Pascal at the time! Then the Pascal, then to C and then myriads of other languages. I covered 6502, Z80, 68k (all of them, to 68040), PowerPC (all of them from 601 prototypes to G5s), ARMs (more than I can count) and x86s (same).

True to be told, the assembly language I started with /helped a LOT/ with be becoming an efficient developer; a developer who understand what 'code' is being generated when he writes an expression, a statement, a loop, and one who understands what the runtime implication are for most of the 'sugar coating' HLL gives.

However, starting (a bit) with the 68k, then even more so with the PowerPC, it became pretty much impossible to write /from scratch/ an assembly equivalent that was QUICKER than the compiler generated code. That was 20+ years ago. DRAM latency happened, pipelining happened and SIMD happened.

Today, hand writing assembly is pretty much stupid on modern CPUs. Given the register files, timings, shadow registers, bus latencies etc etc the compiler will ALWAYS be better because there is so much criteria to think about when generating code...

I'm not saying that having the knowledge is not useful; the best use of assembly is to write some code il HLL, one that is supposed to be super-mega-critical-quick, then disassemble it and see how it looks. More often then not, you can't make it better than it is in situ -- most of the time you will gain is to prepare your data better, align it better etc etc -- basically, 'hinting' the compiler to do a better job. You can do serious code butchery like that, without a hint of assembler [0].

But really, I haven't written any assembly for /performance reasons/ in 15 years, and that was Altivec on PowerPC.

For 8 bits, it's all smooth as butter, but the article also doesn't take into account the massive progress in compilers; I'm the author of SimAVR [1] and I've seen my load of generated code for that CPU, and the GCC toolchain is /very hard to beat/ by hand these days.

[0]: critical audio loop on one of my old PCI card driver, converting float<->int, applying gain etc while using the register file to the max, and making most use of the pipelining of the G4 (at the time) https://gist.github.com/buserror/0a3a69cca927b8da6c9c7ee1605... -- note, the inner loop was generated by a script that was doing the cycle calculations (!)

[1]: https://github.com/buserror/simavr

4 comments

> Today, hand writing assembly is pretty much stupid on modern CPUs

Yup. Explains all that neat hand-written AVX asm code in your video decoder, strcmp() implementation, lzma decompressor, utf8 parser, and the base64 decode logic in your browser.

A lot of people put in a lot of hard work so that you can have the cute thought that there is no more reason to write assembly. Many of them wrote your compilers, some of them wrote some of the logic I mentioned above. Quite sure that none of them appreciate being called "pretty much stupid".

You haven't written any Assembly, because the guys and girls that implemented the compilers you use have done it for you.

Someone has to create those tools.

This article is specific to the 6502 where the commonly used CC65 C compiler the author references produces much, much worse code speed wise than what you can with pure assembly. In that regard, the article is not simplistic in the least. Coincidentally, I messaged the author just yesterday about the for loop example to point out that it was generated without optimization. Even with optimization enabled, the code is still about 3 times slower than hand written assembly. I know this may not be typical for other architectures like AVR but it certainly is for 6502.
I used to do my production code exclusively in 6502 assembler, with some tools in P-system Pascal. As I would read in magazines about the C language I would try to imagine what the C compiler would generate for certain constructs, and I couldn't imagine it being efficient compared to other 8-bit processors. Then we decided to experiment (at the company) with C and got a compiler. I was right, the code was awful. It used exactly the idioms I thought I would use if I had to to it. I can picture a really top-notch compiler doing better (because I'm more familiar with optimization in compilers now), but sooner or later some of the quirks (like 8-bit index registers and only page 0 can be used for pointers) will catch you.
In my experience, you can easily beat a compiler for size, which may or may not correspond to speed.