| Food for thought, a snippet from a highly specialized project I created two months ago: https://gist.github.com/eugene-yaroslavtsev/c9ce9ba66a7141c5... I spent several hours searching online for existing solutions - couldn't find anything (even when exploring the idea of stitching together multiple different tools, each in a different programming language). This took me ~3-4 hours end-to-end. I haven't seen any other OSS code that is able to handle converting unstructured JSON into normalized, structured JSON with a schema, while also using a statistical sampling sliding window method for handling for all these: - speculative SIMD prediction of end of current JSON entry
- distinguishing whether two "similar" looking objects represent the same model or not
- normalizing entities based on how often they're referenced
- ~5-6 GB/s throughput on a Macbook M4 Max 24GB
- arbitrary horizontal scaling (though shared entity/normalization resource contention may eventually become an issue) I didn't write this code. I didn't even come up with all of these ideas in this implementation. I initially just thought "2NF"/"BNF" probably good, right? Not for multi-TB files. This was spec'd out by chatting with Sonnet for ~1.5 hours. It was the one that suggested statistical normalization. It suggested using several approaches for determining whether two objects are the same schema (that + normalization were where most of the complexity decided to live). I did this all on my phone. With my voice. I hope more folks realize this is possible. I strongly encourage you and others reconsider this assumption! |
For example, why is the root object's entityType being passed to the recursive mergeEntities call, instead of extracting the field type from the propSchema?
Several uses of `as` (as well as repeated `result[key] === null`) tests could be eliminated by assigning `result[key]` to a named variable.
Yes, it's amazing that LLMs have reached the level where they can produce almost-correct, almost-clean code. The question remains of whether making it correct and clean takes longer than writing it by hand.