|
|
|
|
|
by justaboutanyone
146 days ago
|
|
You can run large-ish MoE model at good speeds, like gpt-oss-120b, it's snappy enough even with big context. But large and dense at the same time is a bit slow. Running a local LLM will be a load of money for something much slower than the api providers though. |
|