|
|
|
|
|
by andhuman
317 days ago
|
|
Cool demo!
The first thing that sprung to mind after seeing it, was an image of a busy office floor filled with people talking into their headsets, not selling or buying stocks, but actually programming. If it’s a blessed or cursed image I’ll let you decide. |
|
Luckily the other side to this project doesn't require any user behavioural changes. The idea is to convert chat histories into a tree format with the same core algorithm, and then send only the relevant sub-tree to the LLM, reducing input tokens and context bloat, thereby also improving accuracy. This would then also unlock almost infinite length LLM chats. I have been running this LLM context retrieval algo against a few benchmarks, GSM-infinite, nolima, and longbench-v2 benchmarks, the early results are very promising, ~60-90% reduced tokens and increased accuracy against SOTA, however only on a subset of the full benchmark datasets.