| Cute, now do it with UTF-8 support. > People are terrified of parsers and parsing And rightfully so. People who aren't afraid of them generally fail to understand all of the ways in which parsing can show fractal complexity, and will mostly stick to toy examples like this INI parser to justify their positions. If you're gonna argue that parsing is simple, the bare minimum I'd want to see implemented is a context-sensitive grammar with unbounded lookaheads (or at the very least, that is capable of handling more than one token of lookahead), with proper support for Unicode, and actual error resilience (not what this article calls error resilience) If you manage to do all that and can still call what you did "simple" without having completely deluded yourself, congratulations, I hope to be on your level some day. PS1: I won't even go into the plethora of security issues originating from crappy parsers, especially those written in C PS2: Let's also leave aside any matters related to correctness and validation of parsers, which are notoriously not by any means "simple". PS3: Or generating decent errors for that matter. |
As long as all your delimiter chars are ASCII, it just works.
Errors in C are usually because of missing abstractions or the wrong approach. C gives you data layout, flow control, and functions, you can go a long long way with just that.
> unbounded lookaheads
If you want to require that, you get what you deserve. But implementing it is just a matter of putting a queue of tokens in front of your parser that supports look(n) separately from consume().