| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by general_reveal 107 days ago

It’s not necessarily doubling down on local. The reality is your LLM should be inferencing every tick … the same way your brain thinks every. Fucking. Nano. Second.

So yes, the LLM should be inferencing on your prompt, but it should also be inferencing on 25,000 other things … in parallel.

Those are the compute needs.

We just need compute everywhere as fast as possible.