Hacker News new | ask | show | jobs
by Grimblewald 336 days ago
I dont think it's wrong, but i do think models avaliable right now lack the inductive bias required to solve the task appropriately, and have architectural misalignments with the task at hand that mean for a properly reliable output you'll need impossibly large models and impossibly large/varied datasets. Same goes for transformers for language modelling. Extremely adaptable model, but ultimately not aligned with the task of understanding and learning language, so we need enormous piles of data and huge models to get decent output.