Hacker News new | ask | show | jobs
by brucethemoose2 858 days ago
> the more technical users will leverage llama.cpp to run whatever models they are interested in.

Llama.cpp is much slower, and does not have built-in RAG.

TRT-LLM is a finicky deployment grade framework, and TBH having it packaged into a one click install with llama index is very cool. The RAG in particular is beyond what most local LLM UIs do out-of-the-box.