Hacker News new | ask | show | jobs
by crazygringo 469 days ago
If you're streaming something row-based like a CSV, or a zipped CSV, then that's usually easy.

But when you get to hierarchical data structures like JSON/protobuf there very often simply isn't a streaming library available. There's a library function to decode the whole thing into an object in memory, and that's all.

Nothing prevents streaming in theory, it's just far more complicated to write that library.

2 comments

protobuf sure, but streaming libraries for json (and xml, as in the parent) are extremely common. not harder (maybe even easier) than non-streaming to write, tho more cumbersome to use, so something you'd reach for only if you specifically need it ('cuz of memory constraints)

e.g. standard go json library https://pkg.go.dev/encoding/json#example-Decoder.Decode-Stre...

Yup. I don't remember streaming JSON being common in the early days but now it is. But the absence of streaming protobuf is what has killed me, when dealing with gigantic protobuf files from government agencies (ugh).
Heh, yeah. The protobuf people's expectation was if you had a really large dataset you'd wrap it up in your own mini-protocol of "sequence of protobuf messages". But of course that's way more friction, so in practice it will end up not getting done when it should be (plus also, it requires a certain amount of ability to predict the future).

Lesson for technologists: if you want to make the world a better place arrange your tech such that the lowest-friction path is also the correct path.

(Another example: disasterous multi-byte UTF encodings [correct solution was more friction] vs basically successful UTF8 [correct solution was less friction].)

I don't know if you're still dealing w/ this particular problem for protobufs, but based on my experience with thrift, a very similar library, there are probably some not too terrible ways you can kinda hack up the client-side parsing to be more streaming-ish...

nanopb is designed around streaming. It's limited in a few ways[1] but is designed for use on low-memory systems (microcontrollers) where the whole protobuf message won't necessarily fit into memory at once. Might not help for your use cases though, since it's a C library without a stable ABI.

[1]https://jpa.kapsi.fi/nanopb/docs/#features-and-limitations

In programming languages suitable for enterprise software development there are blessed streaming parsers for XML, because it's a rather common task.

It's very common that other programming languages have basic SAX parsers.

What are these languages that don't which you've encountered?