|
|
|
Ask HN: How do you add guard rails in LLM response without breaking streaming?
|
|
48 points
by curious-tech-12
609 days ago
|
|
Hi all,
I am trying to build a simple LLM bot and want to add guard rails so that the LLM responses are constrained.
I tried adjusting system prompt but the response does not always honour the instructions from prompt.
I can manually add validation on the response but then it breaks streaming and hence is visibly slower in response.
How are people handling this situation? |
|
First prompt validates the input. Second prompt starts the actual content generation.
Combine both streams with SSE on the front end and don't render the content stream result until the validation stream returns "OK". In the SSE, encode the chunks of each stream with a stream ID. You can also handle it on the server side by cancelling execution once the first stream ends.
Generally, the experience is good because the validation prompt is shorter and faster to last (and only) token.
The SSE stream ends up like this:
I have a writeup (and repo) of the general technique of multi-streaming: https://chrlschn.dev/blog/2024/05/need-for-speed-llms-beyond... (animated gif at the bottom).