|
|
|
|
|
by c7b
34 days ago
|
|
Thanks for building what I'd hoped to find the time to build (and much better than what I would have made)! One question: do you think there is room for parallelization here, eg in the retry loop? Local models generally can handle a limited number (~ 2 digits) of concurrent requests pretty well, even on consumer hardware, which can give >10x boosts in the effective number of token/s. I've been thinking for a while about workflows that could take advantage of this, and 'fix this error' could be one (if not ideal) application. Would be curious what you think. |
|
That would certainly work in theory, but I'm not as familiar with parallel calls.
- If you mean the model calls the tool twice, identically, in a batch call - that would work fine and Forge handles batch calls, but many small models wouldn't think to do that so you'd have to explicitly prompt it to do so.
- If you mean ask the LLM twice to call the tool and look at both answers, my only concern would be latency from doing 2 calls instead of 1.
- Unless you're truly running 2 instances of the model and aren't memory-bandwidth bound, then yes running parallel workflows would likely help. Especially if you could have them compare notes at certain steps or something.
But I haven't explored this much at all so if you're thinking of something else, let me know!