| HN Mirror

I tend to write it off as less any kind of deep truth about humans (well, maybe the "bachelors can be married" one given that 9/10 students agreed with GPT-3 that bachelors can be married) than just the current weaknesses of how we train NNs like GPT-3 (small, unidirectional, unimodal, not to convergence, missing most of science in PDFs, etc).

In particular, I bet the "how many eyes does a horse have" example would be much less likely with a multimodal model which has actually seen photographs or videos of what the word "horse" describes and can see that, like most mammals, they only have 2 eyes. Think of it as like layers of Swiss cheese: every modality's datasets has its own weird idiosyncrasies and holes where the data is silent & the model learns little, but another modality will have different ones, and the final model trained on them all simultaneously will avoid the flaws of each one in favor of a more correct universal understanding.

I'm very keen to see how much multimodal models can improve over current unimodal models over the next few years.