|
|
|
|
|
by general_reveal
107 days ago
|
|
It’s not necessarily doubling down on local. The reality is your LLM should be inferencing every tick … the same way your brain thinks every. Fucking. Nano. Second. So yes, the LLM should be inferencing on your prompt, but it should also be inferencing on 25,000 other things … in parallel. Those are the compute needs. We just need compute everywhere as fast as possible. |
|