|
|
|
|
|
by jrvarela56
900 days ago
|
|
In your experience, how could these local LLMs become snappier than using streamed API calls? How far are they if not? How soon do you guess they’ll get there? I understand the motivation includes factors other than performance, I’m just curious about performance as it applies to UX. |
|
I don't use it as more than a cool demo though, because the large hosted LLMs (I tend to mostly use GPT-4) are massively more powerful.
But... I'm still intrigued at the idea of a local, slow LLM on my phone enhanced with function calling capabilities, and maybe usable for RAG against private data.
The rate of improvement in these smaller models over the past 6 months has been incredible. We may well find useful applications for them even despite their weaknesses compared to GPT-4 etc.