|
|
|
|
|
by zerokernel
3069 days ago
|
|
> Byte-by-byte parsing is a valid way to do parsing but not the only way. Byte-by-byte parsers tend to be slow and -- arguably, more importantly -- overly complex and rigid. It is, for example, usually very hard to do "random access" with a byte-by-byte parser, because allowing out-of-order parsing tends to blow the code complexity through the roof. I have to agree here by experiences past. If the format in question has a chance of being performance sensitive, don't use FSM-based encodings [1]. It is inordinately difficult to optimize parsing these encodings even if you only have to handle tiny subsets, and it still won't be fast. A format like msgpack which prides itself on being very fast may be fast compared to JSON and other ways to express essentially arbitrary structures, but is DEAD SLOW compared to any direct encoding (be it a dedicated encoding you developed in literally a few hours or something like capnproto). [1] Obviously, considering an encoding more complex than FSM means that you're an idiot and your application will almost certainly have security vulnerabilities related to the format in the future. |
|