|
|
|
|
|
by zarzavat
749 days ago
|
|
While vision-vision models are certainly cool, I don’t think that they are as economically valuable as vision-speech or text-text. Humans don’t have vision output. Computation may be increasing, but that is a statement about the short-term not the long-term. If we want to predict the future then we care about: how many capabilities can you fit on a phone-sized computer? And I believe that the answer is: a lot. |
|