| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by svachalek 368 days ago
	The technical term is constrained decoding. OpenAI has had this for almost a year now. They say it requires generating some artifacts to do efficiently, which slows down the first response but can be cached.