| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aazo11 464 days ago
	I spent a couple of weeks trying out local inference solutions for a project. Wrote up my thoughts with some performance benchmarks in a blog. TLDR -- What these frameworks can do on off the shelf laptops is astounding. However, it is very difficult to find and deploy a task specific model and the models themselves (even with quantization) are so large the download would kill UX for most applications.

1 comments

codelion 464 days ago

There are ways to improve the performance of local LLMs with inference time techniques. You can try with optillm - https://github.com/codelion/optillm it is possible to match the performance of larger models on narrow tasks by doing more at inference.

link