Hacker News new | ask | show | jobs
by ilaksh 481 days ago
There is a limit due to the need to keep model responses nearly instant and the trade off that smaller models that are generally capable of that have. Unless you have unique hardware Only Cerebras can run medium to large models at truly near instant speed.