What I mean is resources will be limited or models that are slightly worse will be released that will be much more cost effective but not quite as good.
This is often the case with these types of technologies.
But what is being optimized? Hardware sure isn't getting faster in a hurry, and I don't see anything on the horizon that will aid in optimizing software.
The various open source LLMs are doing things like reducing bits-per-parameter to reduce hardware requirements; if they're using COTS hardware it almost certainly isn't optimised for their specific models; Moore's Law is pretty heavily reinterpreted, so although we normally care about "operations per second at a fixed number of monies" what matters here is "joules per operation" which can improve a by a huge margin even before human level, which itself appears to be a long way from the limits of the laws of physics; and even if we were near the end of Moore's Law and there was only a 10% total improvement available, that's 10% of a big number.
Moore's law was an effect that stemmed from the locally exponential efficiency increase from designing computers using computers, each iteration growing more powerful and capable of designing still more powerful hardware.
10% here and there is very small compared to the literal orders magnitude improvements during the reign of Moore's Law.
> 10% here and there is very small compared to the literal orders magnitude improvements during the reign of Moore's Law.
Missing the point, despite being internally correct: 10% of $700k/day is still $25M/y.
If you'd instead looked at my point about energy cost per operation, there's room for something like 46,000 improvement just to human level, and 5.3e9 to the Landauer limit.
There are a few avenues. Further specialization of hardware around LLMs, better quantization (3 bits/p seems promising), improved attention mechanisms, use of distilled models for common prompts, etc.
This would be optimizations, which is not really the same thing as moore's law-like growth which was absolutely mind-boggling, like it's hard to even wrap your head around how fast tech was moving in that period since humans don't really grok exponentials too well, we just think they look like second degree polynomials.
Probabilistic computing offers the potential of a return to that pace of progress. We spend a lot of silicon on squashing things to 0/1 with error correction, but using analog voltages to carry information and relying on parameter redundancy for error correction could lead to much greater efficiency both in terms of OPS/mm^2 and OPS/watt.
This is often the case with these types of technologies.