Hacker News new | ask | show | jobs
by bla3 301 days ago
Why do Hunyuan, OpenAI 4o and Gwen get a pass for the octopus test? They don't cover "each tentacle", just some. And midjourney covers 9 of 8 arms with sock puppets.
1 comments

Good point. I probably need to adjust the success pass ratios to be a bit stricter, especially as the models get better.

> midjourney covers 9 of 8 arms with sock puppets.

Midjourney is shown as a fail so I'm not sure what your point is. And those don't even look remotely close to sock puppets, they resemble stockings at best.