| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by brucethemoose2 858 days ago

> the more technical users will leverage llama.cpp to run whatever models they are interested in.

Llama.cpp is much slower, and does not have built-in RAG.

TRT-LLM is a finicky deployment grade framework, and TBH having it packaged into a one click install with llama index is very cool. The RAG in particular is beyond what most local LLM UIs do out-of-the-box.