Some do, some don't. JSON is a special case as a valid JSON file needs to be a single array or object literal - event driven (SaX style) parsing needs to be a hack (like jq stream mode). In theory json_streamer or yajl should help, but I couldn't get a combination to return a proper lazy iterator.
With file as ndjson it was easier, if a little sparsely documented (Zlib::new or #wrap?):
my_it = Zlib::GzipReader.wrap(some_ndfile).lazy
obs = my_it.each_line.lazy.map do |line|
JSON.parse line
end.first(4)
When we can get a line at a time marshalling the whole line isn't an issue.
My issue is more that it is tricky to nest ruby IO objects and return a lazy iterator - especially nesting custom filters along the way - at least more tricky than it should be.
Apparently there's a third party frame work that does seem promising:
With file as ndjson it was easier, if a little sparsely documented (Zlib::new or #wrap?):
When we can get a line at a time marshalling the whole line isn't an issue.My issue is more that it is tricky to nest ruby IO objects and return a lazy iterator - especially nesting custom filters along the way - at least more tricky than it should be.
Apparently there's a third party frame work that does seem promising:
https://iostreams.rocketjob.io/tutorial
Or manual lifting:
https://dev.to/bajena/streaming-gzipped-csv-files-from-ftp-i...
Or:
https://medium.com/smartly-io/streaming-data-with-ruby-enume...
https://github.com/lautis/piperator
I think something more like this should probably be built in, and readily available (for gzip, http, files etc). Maybe I'm greedy.
Btw the shell pipeline to convert a file would be something like this, and is fully streaming: