Hacker News new | ask | show | jobs
by rvanlaar 1183 days ago
Recently had 28GB json of IOT data with no guarantees on the data structure inside.

Used simdjson [1] together with python bindings [2]. Achieved massive speedups for analyzing the data. Before it was in the order of minutes, then it became fast enough to not leave my desk. Reading from disk became the bottleneck, not cpu power and memory.

[1] https://github.com/simdjson/simdjson [2] https://pysimdjson.tkte.ch/

1 comments

If reading from disk is now your bottleneck, next time put it in a (compressed?) ramdisk if you want to feel particularly clever/enjoy sick speedups