Hacker News new | ask | show | jobs
by accrual 606 days ago
Not in production, but I've used a 3B model to test a local LLM application I'm working on. I needed a full end-to-end request/response and it's a lot faster asking a 3B model than an 8B model. I could setup a test harness and replay the responses... but this was a lot simpler.
1 comments

If for testing then why not just mock the whole thing for ultimate performance ... ?
Probably faster to use off the shelf model with llama.cpp than to mock it