I will write something up when the dust settles, I’m still testing things out. It’s a project where the data is fairly standardized but there is about a petabyte to deal with, so I think it makes sense to make investments in efficiency at the lower level rather than through tons of resources at it. That has meant a custom parser for the input data written in Rust, lots of analysis of the statistics of the data, etc. It has been a different approach to data engineering and one that I hope we see more of.