Show HN: oLLM – LLM Inference for large-context tasks on consumer GPUs

Y	Hacker News new \| ask \| show \| jobs

	Show HN: oLLM – LLM Inference for large-context tasks on consumer GPUs (github.com)
	3 points by anuarsh 295 days ago

4 comments

Haeuserschlucht 294 days ago

20 minutes is a huge turnoff, unless you have it run over night.... Just to get the hint that you should exercise self care in the morning when presenting a legal paper and have the ai check it for flaws.

link

anuarsh 294 days ago

We are talking about 100k context here. 20k would be much faster, but you won't need KVCache offloading for it

link

Haeuserschlucht 294 days ago

It's better to have software erase all private details from text and have it checked by cloud ai to then have all placeholders replaced back at your harddrive.

link

attogram 295 days ago

"~20 min for the first token" might turn off some people. But it is totally worth it to get such a large context size on puny systems!

link

anuarsh 295 days ago

Absolutely, there are tons of cases where interactive experience is not required, but ability to process large context to get insights.

link

attogram 294 days ago

It would be interesting to see some benchmarks of this vs, for example, Ollama running localy with no timeout

link

anuarsh 295 days ago

Hi everyone, any comments or questions are appreciated

link