| It's new, yes. I wrote it about two months ago because I needed a parser and found the existing solutions too bloated. > It doesn't allocate or buffer, so when it says that it verifies proper nesting, I assume that doesn't mean verifying that start tags and end tags have matching names. I think that would be impossible. It does match start tags and end tags, and in order to do so it does, indeed, need a small buffer. However, the only thing that needs to be stored in this buffer is a stack of element names, and that rarely needs much space - a simple fixed-size buffer is sufficient. It can be safely allocated on the stack if the application wants to. See the documentation for the yxml_init() for more details. The "mostly-bufferless" part of the homepage refers to the lack of a complicated input buffer and the avoidance of complex buffer management for output where possible. > It looks like you pass the parse function a character at a time. To be fast, you'd want to somehow encourage that to get inlined. The function is quite large. You can probably get it inlined when put inside a header file and when it's only called in a single place, and you may indeed see a performance boost. I haven't measured how significant such an improvement may be. However, if performance is the primary goal (it wasn't for yxml), a completely different design would probably be better. Yxml needs to manage an explicit state object for each input character. For each new character it has to look at the state to know how it's handled, handle it, and then update the state for the next character. I was honestly surprised to see that yxml performed quite well in my benchmarks. A more efficient design is probably the goto-based state machine approach that, e.g. ragel[1] can output. The reason I didn't go for that approach is that it complicates buffering. You either need to have the full file mapped into memory (not possible when reading from a socket or from a compressed stream), or you need to implement your own buffering and provide hooks so that the application can fill the buffer when it is depleted. The latter approach is used by most other XML parsers. [1] http://www.complang.org/ragel/ |