Hacker News new | ask | show | jobs
by nl 265 days ago
Most API providers (Together, Fireworks etc) don't build their own models.
2 comments

You don't need a new model. The trick of the technique is that you only change how tokens are sampled; Zero out the probability of every token that would be illegal under the grammar or other constraints.

All you need for that is an inference API that gives you the full output vector, which is trivial for any model you run on your own hardware.

Though Fireworks is one of the few providers that supports structured generation.