Hacker News new | ask | show | jobs
by noaflaherty 1205 days ago
Thanks for the question! Would you mind elaborating on what you mean by "optimization options?" We've helped a number of our customers fine tune models and optimize for increased quality, lower cost, or decreased latency (e.g. fine-tune curie to perform as well as regular davinci, but at a lower cost and latency).

We offer UIs and APIs for "feeding back actuals" and providing indications on the quality of the models output / what it should have output. This feedback loop is used to then periodically re-train fine-tuned models.

Hopefully this answers your question, but happy to respond with follow-ups if not!

1 comments

I'm thinking about improving model response quality.

Training of preexisting LLM models that I'm familiar with consists of two aspects/sides/options: fine-tuning the model with additional, domain specific data (like internal company documentation) and RLHF (like comparing model responses to customer service actual responses) to further improve how well it's using that and original resources it has access to. That's how https://github.com/CarperAI sets up the process, for example.

What you're describing seems closer to the latter, but I'm not entirely sure if you're following the same structure at all.

Hey, Sidd from Vellum here!

Right now we offer traditional fine tuning with prompt/completion pairs but not training a reward model. This works great for a lot of use cases including classification, extracting structured data, or responding with a very specific tone and style.

For making use of domain specific data we recommend using semantic search to pull in the correct context at runtime instead of trying to fine tune a model on the entire corpus of knowledge.