Hacker News new | ask | show | jobs
by sandkoan 1072 days ago
The prompt is given to our model as a guiding aid (a suggestion), and the cfg is used to constrain the model to generate only tokens that abide by the schema (an enforcement). That's how we ensure only valid outputs at text generation time.

We also prefill some tokens depending on the set of allowed tokens at a given state, so the model doesn't waste resources trying to predict them.

1 comments

When you say “our model”, are you using a custom LLM for completions vs OpenAI or other LLM vendor?
Custom LLM—hence the self-hostability.