Hacker News new | ask | show | jobs
by vunderba 607 days ago
The OP is talking about constraining the response not the input. Granted, in many cases, the input may give some kind of indicator that the large language model may be more prone to generating output that could violate the given constraints but this is not guaranteed by any measure.

As far as I know, there's no way of validating a streamed response until those tokens have already been streamed unfortunately. You could try buffering the stream in larger chunks before displaying them on screen in the hopes that you might be able to catch it earlier, but that's not going to be a great user experience either.