Y
Hacker News
new
|
ask
|
show
|
jobs
by
svachalek
368 days ago
The technical term is constrained decoding. OpenAI has had this for almost a year now. They say it requires generating some artifacts to do efficiently, which slows down the first response but can be cached.