Hacker News new | ask | show | jobs
by vidarh 2752 days ago
Not parse them into a tree, to start with.

Use a streaming JSON parser, and compare them token by token unless/until they diverge, at which point you take whatever actual suitable to identify the delta.

Parsing it into a tree may be necessary if you want to do more complex comparisons (such as sorting child objects etc.), but even then depending on your need you may well be better off storing offsets into the file depending on your requirements.

https://github.com/lloyd/yajl is an example of a streaming JSON parser (caveat: I've not benchmarked it at all), but JSON is simple enough to write one specifically to handle two streams.

1 comments

I believe this comparison benchmark could be useful for you and you can expand further with more tests. Although I got downvoted for sharing a link.

https://github.com/kostya/benchmarks/blob/master/README.md

That still parses into a tree.