Hacker News new | ask | show | jobs
by lossolo 842 days ago
It's because they learn small patterns from datasets, it doesn't matter whether the subjects are Sally, George, sisters, or apples. If a particular logic pattern was not in the training dataset, then the model did not learn it and will fail on most variations of this riddle. These transformer models are essentially large collections of local optima over logic patterns in sentences. If a pattern was not present in the dataset, there is no local optimum for it, and the model will likely fail in those cases.