Hacker News new | ask | show | jobs
by _pvxk 2322 days ago
* isSpace handles tabs, but looking at a single byte at a time it won't handle all the multibyte space symbols you can have in unicode. If you read further down, they rip out the remains of unicode handling for further speed improvements.

* Looking at a single byte at a time, it presumably only handles the "C" locale :) They don't say what locale GNU wc was tested with (if it's not LANG=C, that benchmark should be re-run)

* --max-line-length? no. But I'm guessing GNU wc isn't benchmarked with that option on (can't find the invocation in the blog post though)

* data State { ws, bs, ls } keeps count of words, bytes (more honest than calling it characters) and lines.

1 comments

Thanks for the reply ...

> ... further down, they rip out the remains of unicode handling ...

Ah. Well, that makes it a little unfair, surely.

> Looking at a single byte at a time, it presumably only handles the "C" locale ...

Again.

> --max-line-length? no. But I'm guessing GNU wc isn't benchmarked with that option on

I wonder if wc does the work anyway, and only reports it if asked, or if it actually changes the code path if it's not needed.

So this entire post feels ... intellectually dishonest. personally I'm all in favour of Haskell, and I wish I had the chance to use it "in anger" rather than just doing the occasional toy thingie that I do. But this post doesn't do it or its community any favours.

Disappointing.

Yes. "Destroying C". The whole post feels like youthful bravado untempered by experience.