Hacker News new | ask | show | jobs
by mtdewcmu 4594 days ago
I was interested in yours specifically because it's the only one I've seen that doesn't buffer (almost). You might as well drop the last little buffer and be completely bufferless. Then it will never cause an error or block, and you'll have no dependencies. If you want to support validation, make a higher-level interface to this and do it there. It will be just as easy.
1 comments

Hmm? The stack buffer in yxml will never cause parsing to block, and there's no added dependencies on... anything? I don't think that buffer causes any problems even on a size-restricted microcontroller that doesn't have malloc(). As long as you can find ~512 bytes or so of free memory you can parse a lot of files.

The only situation in which that buffer would cause an error is when the application used a too small buffer, or when the document is far too deeply nested or has extremely long element/property names. Both the maximum nesting level name lengths should, IMO, be limited in the parser in order to protect against malicious documents. Most parsers have separate settings for that, yxml simplifies that by letting the application control the size of a buffer.

The stack buffer in yxml is also used to make the API a bit easier to use. With the buffer I can pass element/property names as a single zero-terminated C string to the application, without it I would have to use the same mechanism as used for attribute values and element contents, and that mechanism isn't all that easy to use. (This is the one case where I chose convenience over simplicity, but I kinda wanted the validation anyway so that wasn't really a problem)

Ah. I have not yet looked too far into how yxml works. I never came up with a perfectly satisfactory solution to the zero-buffer problem myself, but you've hit on a lot of the things that make it a hassle either way. It's almost impossible to do a usable xml parser under the assumption that it will not buffer and it will not be guaranteed access to more than one character at a time. I started developing one like that based on a goto-driven state machine, but I stopped working on it, because the interface was going to be too inconvenient.

What I meant about blocking was blocking on malloc. It sounds like you're expecting the caller to take care of allocation?