Hacker News new | ask | show | jobs
by enqk 370 days ago
If you only have two points you would always assume linear. But what if it’s quadratic, like this article claims?

https://www.timdavis.com/blog/scale-or-surrender-when-watts-...

1 comments

Good point!

The good news would be that GPT-4o average energy usage per query would be lower than 20 Wh.

The bad news is that there's a quadratic increase in energy usage with the increase in a model's maximum context window. GPT 3.5 -> GPT 4 was an increase from thousands of tokens to hundreds of thousands of tokens.