| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cb321 805 days ago

I think of what you say as really just the first step on the path to the parsing state machine in the c2tsv.nim (or /c2dsv.c in the same folder) thing I mentioned above which have comments in their source code.

I think it helps to think of the problem more like "How do I translate a complex syntax buffered input stream which 'most' of the time just translates ',' to '\t' into a buffered output stream that is "almost" as fast as a Unix `tr , \\t`?" If there were no escaping/quoting the output buffer could literally be the same memory as the input, just with the delimiter bytes changed.

The next step is realizing that you can still just do this byte translation if you "flush" the IO buffer opportunistically at syntactically relevant times. That gets you the "almost" performance. (Scare quotes on "almost" since you might do a few more IO-system calls with certain kinds of dense syntax, but unlike your "reassemble" there won't be any allocations. Various trade-offs, but a nifty design.)

There are other nice aspects to the "partitioned program design" mentioned a sibling-ish comment, but, all together, I think it is a pretty tidy solution.