|
|
|
|
|
by alooPotato
1185 days ago
|
|
It's just so slow for the autocompletion use case to do it like that. Ideally, you're never chaining serial requests to the LLM. Even if you do stuff in all the data into a single prompt, the execution time seems to be superlinear with the number of tokens, again getting super slow. |
|
I'm happy to wait even 30-60 seconds for this which I can easily evaluate, criticize (and the model will correct it) and then proceed to just patch and move on. I think the results from this will be much better with the 32k model, but remains to be seen.