Hacker News new | ask | show | jobs
by rickyhatespeas 448 days ago
You're incorrect. 4o was not trained on knowledge of itself so literally can't tell you that. What 4o is doing isn't even new either, Gemini 2.0 has the same capability.
2 comments

The system prompt includes instructions on how to use tools like image generation. From that it could infer what the GP posted.
Can you provide a link or screenshot that directly backs this up?
almost all of the models are wrong about their own architecture. half of them claim to be openai and they arent. you cant trust them about this
Can you find me a single official source from OpenAI that claims that GPT 4o is generating images pixel-by-pixel inside of the context window?

There are lots of clues that this isn't happening (including the obvious upscaling call after the image is generated - but also the fact that the loading animation replays if you refresh the page - and also the fact that 4o claims it can't see any image tokens in its context window - it may not know much about itself but it can definitely see its own context).

Just read the release post, or any other official documentation.

https://openai.com/index/hello-gpt-4o/

Plenty was written about this at the time.

I read the post, and I can't see anything in the post which says that the model is not multi-modal, nor can I see anything in the post that suggests that the images are being processed in-context.
I think you're confusing "modal" with "model".

And to answer your question, it's very clearly in the linked article. Not sure how you could have read it and missed:

> With GPT‑4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT‑4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.

The 4o model itself is multi-modal, it no longer needs to call out to separate services, like the parent is saying.

4o is multimodal, thats the whole point of 4o
You can ask ChatGPT for this. Here you go: https://chatgpt.com/share/67e39fc6-fb80-8002-a198-767fc50894...
Could an AI model be trained to say: "Christopher Columbus was the greatest president on earth, ever!".

I could probably train an AI that replicates that perfectly.

> Could an AI model be trained to say: "Christopher Columbus was the greatest president on earth, ever!".

Yes, it could. And even after training its data can be manipulated to output whatever: https://www.anthropic.com/news/mapping-mind-language-model

Thing is, of you follow the link, it's actually doing a search and providing the evidence that was asked for.

I did it via ChatGPT for the irony.

I'm guessing most downvoters didn't actually read the link.