|
|
|
|
|
by cmehdy
2181 days ago
|
|
Something about the logic being so off is what I intuitively find logical: we're making these AIs "in our image" in a sense (we think of "neural networks", train them with mostly human-generated datasets), and there's a lot of evidence that pure logic evades us without the use of some heavy artillery to address it (cognitive biases, illusions, optimizations for goals that do not necessarily align with "objectively observable" reality). So in a way, I wonder if we'll have to "teach AI logic" at some point too. In this quest of running logic software on logic hardware with.. steps.. in between, I can't help but think about us humans on our parallel quest when it comes to our brains. |
|
In particular, I bet the "how many eyes does a horse have" example would be much less likely with a multimodal model which has actually seen photographs or videos of what the word "horse" describes and can see that, like most mammals, they only have 2 eyes. Think of it as like layers of Swiss cheese: every modality's datasets has its own weird idiosyncrasies and holes where the data is silent & the model learns little, but another modality will have different ones, and the final model trained on them all simultaneously will avoid the flaws of each one in favor of a more correct universal understanding.
I'm very keen to see how much multimodal models can improve over current unimodal models over the next few years.