Hacker News new | ask | show | jobs
by hackinthebochs 1083 days ago
"Picking the most likely token based on probabilities" doesn't accurately describe their architecture. They are not intrinsically statistical, they are fully deterministic. The next word is scored (and then normalized to give something interpretable as a probability), But the calculation performed to determine the score for the next token considers the full context window and features therein, while leveraging the meaning of the terms by way of semantic embeddings and its trained knowledge base. It is not obvious that the network does not engage with the meaning of the terms in the context window when scoring the next word, and it certainly can't be dismissed by characterizing it as just engaging with probabilities. There is reason to believe that the network does understand to some degree in some cases. I go into some detail here: https://www.reddit.com/r/naturalism/comments/1236vzf/on_larg...