|
|
|
|
|
by cyanydeez
29 days ago
|
|
you can extend the test pretty easily. run through design turns and ask it for it again and again. effectively measure context length. ask it to modify lines 120-130 and add more context, etc. we have rudimentry preLLM algoritms that can measure hamming distance and hashing. you could even go all https://en.wikipedia.org/wiki/Jabberwocky to see if its sense of context is easily polluted. the point though is there are benchmarks beyong pelican on a bike that cant be tokenmaxx and prove real value in capabilities |
|