Y
Hacker News
new
|
ask
|
show
|
jobs
by
Tarq0n
81 days ago
If it works for predicting the next token in a very long stream of tokens, why not. The question is what architecture and training regimen it needs to generalize.