Hacker News new | ask | show | jobs
by shoo 961 days ago
we could re-frame by distinguishing problem statements from implementations

Problem A: read a stream of bytes, parse it as JSON

Problem B: read a stream of bytes, count how many bytes match a JSON whitespace character

Problem B should require fewer resources* to solve than problem A. So in that sense problem B is a relaxation of problem A, and a highly efficient implementation of problem B should be able to process bytes much more efficiently than an "optimal" implementation of problem A.

So in this sense, we can probably all agree with the author that counting whitespace bytes is an easier problem than the full parsing problem.

We're agreed that the author's implementation (half a page of go code that fits on a talk slide) to solve problem B isn't the most efficient way to solve problem B.

I remember reading somewhere the advice that to set a really solid target for benchmarking, you should avoid measuring the performance of implementations and instead try to estimate a theoretical upper bound on performance, based on say a simplified model of how the hardware works and a simplification of the problem -- that hopefully still captures the essence of what the bottleneck is. Then you can compare any implementation to that (unreachable) theoretical upper bound, to get more of an idea of how much performance is still left on the table.

* for reasonably boring choices of target platform, e.g. amd64 + ram, not some hypothetical hardware platform with surprisingly fast dedicated support for JSON parsing and bad support for anything else.

1 comments

Everything you said is totally reasonable. I'm a big fan of napkin math and theoretical upper bounds on performance.

simdjson (https://github.com/simdjson/simdjson) claims to fully parse JSON on the order of 3 GB/sec. Which is faster than OP's Go whitespace parsing! These tests are running on different hardware so it's not apples-to-apples.

The phrase "cannot go faster than this" is just begging for a "well ackshully". Which I hate to do. But the fact that there is an existence proof of Problem A running faster in C++ SIMD than OP's Probably B scalar Go is quite interesting and worth calling out imho. But I admit it doesn't change the rest of the post.