Hacker News new | ask | show | jobs
by attogram 295 days ago
"~20 min for the first token" might turn off some people. But it is totally worth it to get such a large context size on puny systems!
1 comments

Absolutely, there are tons of cases where interactive experience is not required, but ability to process large context to get insights.
It would be interesting to see some benchmarks of this vs, for example, Ollama running localy with no timeout