| HN Mirror

you can extend the test pretty easily. run through design turns and ask it for it again and again. effectively measure context length.

ask it to modify lines 120-130 and add more context, etc.

we have rudimentry preLLM algoritms that can measure hamming distance and hashing.

you could even go all https://en.wikipedia.org/wiki/Jabberwocky to see if its sense of context is easily polluted.

the point though is there are benchmarks beyong pelican on a bike that cant be tokenmaxx and prove real value in capabilities