| Hi HN, happy to see this here! I highly recommend to take a look at the technical details of the server implementation that enables large context usage with this plugin - I think it is interesting and has some cool ideas [0]. Also, the same plugin is available for VS Code [1]. Let me know if you have any questions about the plugin - happy to explain. Btw, the performance has improved compared to what is seen in the README videos thanks to client-side caching. [0] - https://github.com/ggerganov/llama.cpp/pull/9787 [1] - https://github.com/ggml-org/llama.vscode |