Same reason that self driving cars have plateaued. The things that are really important for that last mile are also really hard and we're not really sure how to get there.
The output between all the major AI image synthesis tools, while impressive, is quite clearly very high dimensional interpolation, and it shares the same limitations in that space as it does the 2d space. We've already thrown insane amounts of data and compute resources at the problem and it's made the things that were done well in the past even better but doesn't move us towards solving the hard parts.
It's making slow progress, much slower than expected 10 years ago, when in 2012 I was personally "promised" by a person with a start up in the field that I will be able to buy a L5 car in 5 years. That person is the equivalent of those promising me I will be able to generate a whole movie by a prompt by the end of this decade.
Coinbase stock keeps going up despite it being widely acknowledged that crypto is fundamentally a scam, so anything is possible.
Though it's more likely in a the coming months/years people's ability ignore reality will just become stronger and we'll all believe that a person waving their 4 wiggly fingers and staring at you with strange fish eyes is hyper realism at it's finest.
We've entered the "emperor's new clothes" reality at this point, so nothing would surprise me.
The output between all the major AI image synthesis tools, while impressive, is quite clearly very high dimensional interpolation, and it shares the same limitations in that space as it does the 2d space. We've already thrown insane amounts of data and compute resources at the problem and it's made the things that were done well in the past even better but doesn't move us towards solving the hard parts.