| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zarzavat 749 days ago

While vision-vision models are certainly cool, I don’t think that they are as economically valuable as vision-speech or text-text. Humans don’t have vision output.

Computation may be increasing, but that is a statement about the short-term not the long-term.

If we want to predict the future then we care about: how many capabilities can you fit on a phone-sized computer? And I believe that the answer is: a lot.