| HN Mirror

I'm sorry to get into this conversation, but the performance of a model is some orders of magnitude lower (meaning it requires greater amounts of specific computing power) than all the network stack of all the nodes involved in the internet traffic of some particular request.

Meaning: these 5000 tokens consume tiny amounts of energy being moved all around from the data center to your PC, but enormous amounts of energy being generated at all. An equivalent webpage with the same amount of text as these tokens would be perceived as instant in any network configuration. Just some kilobytes of text. Much smaller than most background graphics. The two things can't be compared at all.

However, just last week there have been huge improvements on the hardware required to run some particular models, thanks to some very clever quantisation. This lowers the memory required 6x in our home hardware, which is great.

In the end, we spent more energy playing videogames during the last two decades, than all this AI craze, and it was never a problem. We surely can run models locally, and heat our homes in winter.