Hacker News new | ask | show | jobs
by gitfan86 1534 days ago
You cannot absorb words as fast as pictures. GTP-3 is more impressive as it seems to have auch broader depth of understanding context. The disadvantage of GTP-3 is that it is sometimes very wrong like with simple math problems
1 comments

Interestingly DALL-E is really bad at spelling. It knows what letters look like, but struggles with words.
Yes, and if you look at the "blue cube on a red cube beside a yellow sphere" example, it's clear that there are other areas where it simply lacks the semantic basis to get a request that needs to be correct in a non-image sense right. It knows letters, and that letters come in sequences related to things it might paint, but it has no very good dictionary mapping those sequences to things; it knows how to draw a cube, and a sphere, but the semantics of "on" and "beside" are largely absent.

I don't think that is terribly surprising, nor a very cogent detraction from the model.

Very interesting observation!!!