Huh? Why would that happen? Indications are that costs will likely go up, especially if currently vendors are selling tokens at a loss.
Even if you generously depreciate the GPU and other hardware, it’s hard to believe inference at scale in April 2026 isn’t highly profitable.
I think you meant dollars of electricity.
https://www.theregister.com/2024/03/18/nvidia_turns_up_the_a...
A Blackwell 8X node consumes about 15kw, let’s up that to 50kw to generously account for cooling and everything else.
A US kWh is something like $0.20, so running that node for an hour costs ~$10.
Nvidia got 30,000 parallel TPS out of DeepSeek-R1 on that node:
https://developer.nvidia.com/blog/nvidia-blackwell-delivers-...
So that $10 buys you over 100M tokens or … pennies per million.
I’m sure these numbers are off, but not by an aggregate two orders of magnitude.
Even if you generously depreciate the GPU and other hardware, it’s hard to believe inference at scale in April 2026 isn’t highly profitable.