| Some do, some don't. JSON is a special case as a valid JSON file needs to be a single array or object literal - event driven (SaX style) parsing needs to be a hack (like jq stream mode). In theory json_streamer or yajl should help, but I couldn't get a combination to return a proper lazy iterator. With file as ndjson it was easier, if a little sparsely documented (Zlib::new or #wrap?): my_it = Zlib::GzipReader.wrap(some_ndfile).lazy
obs = my_it.each_line.lazy.map do |line|
JSON.parse line
end.first(4)
When we can get a line at a time marshalling the whole line isn't an issue.My issue is more that it is tricky to nest ruby IO objects and return a lazy iterator - especially nesting custom filters along the way - at least more tricky than it should be. Apparently there's a third party frame work that does seem promising: https://iostreams.rocketjob.io/tutorial Or manual lifting: https://dev.to/bajena/streaming-gzipped-csv-files-from-ftp-i... Or: https://medium.com/smartly-io/streaming-data-with-ruby-enume... https://github.com/lautis/piperator I think something more like this should probably be built in, and readily available (for gzip, http, files etc). Maybe I'm greedy. Btw the shell pipeline to convert a file would be something like this, and is fully streaming: # gzipped JSON to gzipped ndjson, stripping top level array:
gzcat file.json.gz \
| jq -cn --stream 'fromstream(inputs|(.[0] |= .[1:]) | select(. != [[]]) )' \
| gzip -9 \
> file.ndjson.gzip
|