| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by foooorsyth 940 days ago
	Text and 2D images are a tiny subset of physical reality as perceived by an able-bodied human. Even our best approximation (3D VR headset with Spatial Audio) is a poor representation. We don’t even bother to simulate touch, temperature, equilibrio-sense, etc. And the more detailed you get, the less data you have. These senses can be described via text, but I’m highly skeptical that the learning outcomes will be the same.

1 comments

valine 940 days ago

>> Text and 2D images are a tiny subset of physical reality as perceived by an able-bodied human. Even our best approximation is a poor representation.

This is wrong. There’s nothing magical about human perception. You see the world because a 2D image is projected onto your retina.

GPT-4 was trained on text and generalized the ability to output 2D images. There’s absolutely nothing to suggest text can’t generalize further to new modalities. GPT4 is forced to serialize images as SVGs to output them (a crazy emergent ability btw), but that demonstrates an inherent spatial reasoning capability baked into the model.

GPT4V was created with a transfer learning step where image embeddings are passed as input in place of text. That’s further evidence of models ability to generalize to new modalities.

Everything you need to do multimodal input and output is already trained in, GPT-4V I’m sure is just the start.

link

foooorsyth 940 days ago

>GPT-4 was trained on text

And it shows. It has a poor grasp of reality. It does a poor job with complex tasks. It cannot be trusted with specialized tasks typically done by expert humans. It is certainly an amazing technical achievement that does a decent job with simple tasks requiring cursory knowledge, but that’s all it is at this time.

>There’s absolutely nothing to suggest text can’t generalized further to new modalities

Inversion of burden of proof.

link

valine 940 days ago

>> Inversion of burden of proof

Nope. OpenAI has already demonstrated the ability to generalize GPT4 to a new modality. Your claim that text models can only generalize to images and not other modalities is utterly unconvincing. Explain to me why vision is so much different than say audio?

>> And it shows. It has a poor grasp of reality. It does a poor job with complex tasks.

GPT4 is a proof of concept more than anything. I’m excited to see how much reliability improves over time. It’s grasp of reality isn’t prefect, but at least it understands how burden of proof works.

link

foooorsyth 940 days ago

>GPT4 is a proof of concept more than anything

Hilarious walk-back. “Text can generalize anything” —-> “It’s just a demo, bro” in the same post.

Lmao

link

valine 940 days ago

I walked back nothing. OpenAI was surprised by the mass adoption of ChatGPT, they saw it as an early technical preview.

I don’t understand why some people have a such hard time envisioning the potential of new technologies without a polished end product in their hands. Imagine if AI researchers had the same attitude.

Technology can be both real and unpolished at the same time. Those two things are not contradictory.

link