| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rickyhatespeas 448 days ago
	You're incorrect. 4o was not trained on knowledge of itself so literally can't tell you that. What 4o is doing isn't even new either, Gemini 2.0 has the same capability.

2 comments

teaearlgraycold 448 days ago

The system prompt includes instructions on how to use tools like image generation. From that it could infer what the GP posted.

link

Taek 448 days ago

Can you provide a link or screenshot that directly backs this up?

link

wegfawefgawefg 448 days ago

almost all of the models are wrong about their own architecture. half of them claim to be openai and they arent. you cant trust them about this

link

Taek 448 days ago

Can you find me a single official source from OpenAI that claims that GPT 4o is generating images pixel-by-pixel inside of the context window?

There are lots of clues that this isn't happening (including the obvious upscaling call after the image is generated - but also the fact that the loading animation replays if you refresh the page - and also the fact that 4o claims it can't see any image tokens in its context window - it may not know much about itself but it can definitely see its own context).

link

theptip 448 days ago

Just read the release post, or any other official documentation.

https://openai.com/index/hello-gpt-4o/

Plenty was written about this at the time.

link

Taek 447 days ago

I read the post, and I can't see anything in the post which says that the model is not multi-modal, nor can I see anything in the post that suggests that the images are being processed in-context.

link

Tadpole9181 446 days ago

I think you're confusing "modal" with "model".

And to answer your question, it's very clearly in the linked article. Not sure how you could have read it and missed:

> With GPT‑4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT‑4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.

The 4o model itself is multi-modal, it no longer needs to call out to separate services, like the parent is saying.

link

wegfawefgawefg 447 days ago

4o is multimodal, thats the whole point of 4o

link

barrkel 448 days ago

You can ask ChatGPT for this. Here you go: https://chatgpt.com/share/67e39fc6-fb80-8002-a198-767fc50894...

link

bb88 448 days ago

Could an AI model be trained to say: "Christopher Columbus was the greatest president on earth, ever!".

I could probably train an AI that replicates that perfectly.

link

troupo 448 days ago

> Could an AI model be trained to say: "Christopher Columbus was the greatest president on earth, ever!".

Yes, it could. And even after training its data can be manipulated to output whatever: https://www.anthropic.com/news/mapping-mind-language-model

link

barrkel 448 days ago

Thing is, of you follow the link, it's actually doing a search and providing the evidence that was asked for.

I did it via ChatGPT for the irony.

link

barrkel 448 days ago

I'm guessing most downvoters didn't actually read the link.

link