Hacker News new | ask | show | jobs
by jug 308 days ago
I was just thinking that. It has many, many errors.

1. Not seen browsing ”ai.dev”.

2. The text ”Imagen 4 is now generally available!” is spoken, not a comic caption.

3. Invalid second panel.

4. Hallucinates ”Meet Imagen 4 fast!”

5. Hallucinates ”It offers low..” etc. (this is the second part of a single sentence said by the cat)

6. Hallucinates ”You can export images in 2K!” (this sentence is not asked for)

7. Doesn’t have the cat and the dog in the fourth panel.

Here’s the gpt-image-1 counterpart with the issues I could find:

https://chatgpt.com/share/689f7e4b-01e4-8011-8997-0f37edf8c2...

1. The text ”Imagen 4 is now generally available!” is still spoken, not a caption.

2. ”low latency” -> ”low-laten”

(3. Has that ugly gpt-image-1 trademark yellow filter requiring work in post to avoid.)

I didn’t bring up the ”retro comic look” thing. I certainly think it’s an issue with Imagen 4’s version. It doesn’t look very old school at all. But I can’t judge the OpenAI one either on that, I’m no comic book expert, so I just skipped that one.

5 comments

I got this result with the basic copilot app

https://i.imgur.com/kSuqCYg.jpeg

The pervasive yellow tinge indicates that that is almost assuredly `gpt-image-1` - OpenAI's flagship model and (aesthetics aside) the highest scoring model in terms of strict prompt adherence that I've seen.

https://genai-showdown.specr.net

honestly, that's pretty good
Ran your same prompt, copypasta, got this. https://i.imgur.com/wOocci9.png Cat on panel 3 seems a bit off. I like the first panel.
The cat also has more fingers on one hand than the other. It's a small, inconsequential thing but it always draws my eye in generated images.
What do you have to do to remove the watermark? Is Google's SynthID watermark on top of the image as well or is it embedded in EXIF data?
Google's SynthID is embedded into the content itself. Google open sourced their SynthID for text.

Repo: https://github.com/google-deepmind/synthid-text

Paper: https://www.nature.com/articles/s41586-024-08025-4

With images and video, it's less clear exactly what they're doing, but it's watermarking on the pixel leve. From one of their blog posts:

  Videos are composed of individual frames or still images. So we developed a watermarking technique inspired by our SynthID for image tool. This technique embeds a watermark directly into the pixels of every video frame, making it imperceptible to the human eye, but detectable for identification.
https://deepmind.google/discover/blog/watermarking-ai-genera...

Elevenlab's audio watermarking is trivial to shake off with compression, but google claims that synthid is resilient to such manipulation.

Has anyone identified the SynthID in an image or is there a tool that will determine images are AI generated by checking if it's there?
synthid used to be a waitlist-only tool but you can now check to see if images are made by imagen in google’s cloud console. You have to have a Vertex billing account to use it.

https://console.cloud.google.com/vertex-ai/studio/media/gene...

> I didn’t bring up the ”retro comic look” thing. (…) I’m no comic book expert, so I just skipped that one.

I’m no Scott McCloud, but the OpenAI version definitely does a better job with the retro style. The yellow filter you criticised actually helps to sell the illusion. The Imagen version utterly fails in the retro area, that style is very much modern.

But there are other important flaws in the OpenAI version. The fourth panel has a different cat (the head shape and stripes are wrong) and it bleeds into the previous panel. Technically that could be a stylistic choice, except that the floor/table is inconsistent, making it clear it was a mistake.