Hacker News new | ask | show | jobs
by 0xd34df00d 2331 days ago
Those are fairly trivial and well-known optimizations that I did (and I by no means am an expert in writing high-performant code), so all the honors go to GHC authors.
2 comments

Thanks for replying. I want to tell you that I feel deep regret for my impulse to publically shame you - even though I've gotten a lot of points for this comment, and did not really receive criticism for it.

Hey - if you make performance optimizations and compare implementations, it's probably best not to jump to quick conclusions. I would advise to brush up on C to get a feel for performance. Or, in times where it's popular to shit on C, and claim that it's not really a low-level language anymore, I would advise to write some assembler (which I've barely done). In practice it's unlikely that you find yourself in a situation where you can write code in a high-level language that runs considerably faster than what you could realistically write in C. The only situation where that can happen that I can see, is when the program does something that is so complicated but seems so arbitrary that you simply can't bring yourself to invest more time than what you need to hack together a quick Python script, and what you would be willing to write in C would be not even the asymptotically best approach that you can see.

For a simple program like wc, that situation does not apply, and if you find surprising results, it's best to first check for other possible reasons than "thousands of graybeards were wrong".

No worries, that's a natural reaction!

> In practice it's unlikely that you find yourself in a situation where you can write code in a high-level language that runs considerably faster than what you could realistically write in C.

I do way more C++ (in fact, I don't do pure C at all), and aliasing has bitten me and my code performance more often than I'd like. While there are workarounds, I'd probably consider spending time and effort on them as rather unrealistic in a sense. So it surely doesn't contradict my world model if a language with a stricter type system (Haskell? Rust? ATS anyone?) achieves better results on at least some of the tasks with less effort and less dependence on implementation details.

Although ironically I'm going to write something low-level for the Haskell bytestrings library today evening, in C with intrinsics (so almost assembly modulo stuff like register allocation).

The github repo description is equally distasteful too:

> wc implemented in Haskell (significantly faster than GNU coreutils version — oops I did it again

For reference, I'm referring to the "oops I did it again" part. It's really hard to take that comment as "honours go to GHC authors".

Also, I suggest you try running the GNU wc with unicode turned off because unicode is computationally expensive and you're deliberately disabling unicode support in your own code anyway. I appreciate you said you'd add in the edge cases that GNU does in your next blog post but disabling unicode in GNU for this benchmark would show good faith that you're at least trying to compare like for like. And if GHC still out performs then you can at least legitimately say:

> My code outperforms GNU for non-unicode strings

Which currently you cannot because your claim is based on incorrect benchmarks.

> For reference, I'm referring to the "oops I did it again" part. It's really hard to take that comment as "honours go to GHC authors".

Was overly excited when I created the repo after obtaining the first results. Childish indeed, thanks for reminding, fixed.

> Also, I suggest you try running the GNU wc with unicode turned off because unicode is computationally expensive and you're deliberately disabling unicode support in your own code anyway.

I tried running wc as `LC_ALL=C wc file.txt`, and it (surprisingly for me) resulted in worse run time for wc (my default locale is ru_RU.UTF-8 for comparison). This reproduced on two machines of mine and also on a machine of a friend of mine who also gave my code a shot.

My bad for omitting this in the post, I'll update it accordingly.

wc running slower seems wrong. I wonder if it's doing additional sanitising because your LANG differs from your LC_ALL. I'd need to read through the code to get a handle on what it does and doesn't expect though but I definitely wouldn't expect wc to run slower.