Hacker News new | ask | show | jobs
by Tarq0n 81 days ago
If it works for predicting the next token in a very long stream of tokens, why not. The question is what architecture and training regimen it needs to generalize.