| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mikewarot 314 days ago

RAM is the reason LLMs are so power inefficient. Shuttling weights and results from RAM to compute and back for everything is where most of the power goes.

It doesn't have to be that way. For a sufficiently large load, it makes sense to use reconfigurable hardware and bake in the constants and s dataflow at runtime.

Think of it like using an array of FPGAs large enough to hold the whole model unwound, yet that could be configured in seconds at runtime. You'd get tokens at 100 MHz or more .

You would think saving 95% or more on power and infrastructure for a given token rate would be worth it, especially when contemplating Trillion dollar outlays.

1 comments

vouaobrasil 314 days ago

Many things don't have to be the way they are. But as long as the powerful big tech can subsidize their costs on the commons of the environment in the form of environmental damage without regulation, they will only pay lip service to making things more efficient. Money is a much more powerful motivator to the unscrupulous than protecting the long-term health of the commons.

link

satvikpendem 314 days ago

It costs money to run a datacenter, so even if you consider money to be their motivator, it benefits them to make them more efficient.

link

vouaobrasil 314 days ago

Not if spending a little extra money and keeping with the inefficiency helps make money in other ways, such as getting the product out faster or allowing their workers to focus on the tech stack. Saving a little electricity cost might cost them in their development, so they're likely to use the cheap electricity and offset the cost to the environment.

link

satvikpendem 314 days ago

The people who work on the product are not also working on the data centers

link