Hacker News new | ask | show | jobs
by jstimpfle 1063 days ago
UTF-8 was designed so that you don't have to worry about it. Supporting UTF-8 in a parser is trivial, basically just parse as if it were ASCII but don't barf on the bytes >= 128.

As long as all your delimiter chars are ASCII, it just works.

Errors in C are usually because of missing abstractions or the wrong approach. C gives you data layout, flow control, and functions, you can go a long long way with just that.

> unbounded lookaheads

If you want to require that, you get what you deserve. But implementing it is just a matter of putting a queue of tokens in front of your parser that supports look(n) separately from consume().