|
|
|
|
|
by m3kw9
1128 days ago
|
|
I’m not understanding how Guidence Accelerating works. It says “ This cuts this prompt's runtime in half vs. a standard generation approach.” and it gives an example of it asking LLM to generate json. I don’t see anywhere how it accelerates anything because it’s a simple json completion call. How can you accelerate that? |
|
Let's say you're halfway through a generation of a json blob with a name field and a job field and have already generated
At this point, guidance will take over generation control from the model to generate the next text If the model had generated that, you'd be waiting 70 ms per token (informal benchmark on my M2 air). A comma, followed by a newline, followed by "job": is 6 tokens, or 420ms. But since guidance took over, you save all that time.Then guidance passes control back to the model for generating the next field value.
programmer is 2 tokens and the closing " is 1 token, so this took 210ms to generate. Guidance then takes over again to finish the blob [1] https://github.com/1rgs/jsonformer https://github.com/newhouseb/clownfish Note: guidance is way more general of a tool than theseEdit: spacing