| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by accrual 606 days ago
	Not in production, but I've used a 3B model to test a local LLM application I'm working on. I needed a full end-to-end request/response and it's a lot faster asking a 3B model than an 8B model. I could setup a test harness and replay the responses... but this was a lot simpler.

1 comments

If for testing then why not just mock the whole thing for ultimate performance ... ?

Probably faster to use off the shelf model with llama.cpp than to mock it