|
|
|
|
|
by boutell
199 days ago
|
|
He lost me a bit at the end talking about running chat bots on CPUs. I know it's possible, but it's inherently parallel computing isn't it? Would that ever really make sense? I expected to hear something more like low end consumer gpus. Recent generation llms do seem to have some significant efficiency gains. And routers to decide if you really need all of their power on a given question. And Google is building their custom tpus. So I'm not sure if I buy the idea that everyone ignores efficiency. |
|