|
|
|
|
|
by cedws
653 days ago
|
|
I had a similar idea[0], interesting to see that it actually works. The faster LLM workloads can be accelerated, the more ‘thinking’ the LLM can do before it emits a final answer. [0]: https://news.ycombinator.com/item?id=41377042 |
|
[0]: https://github.com/ggerganov/llama.cpp/blob/master/grammars/...