|
|
|
|
|
by joshhart
605 days ago
|
|
Hi, I run the model serving team at Databricks. Usually you run regex filters, LLAMA Guard, etc on chunks at a time so you are still streaming but it's in batches of tokens rather than single tokens at a time. Hope that helps! You could of course use us and get that out of the box if you have access to Databricks. |
|