Hacker News new | ask | show | jobs
by bob1029 97 days ago
Next token prediction is about predicting the future by minimizing the number of bits required to encode the past. It is fundamentally causal and has a discrete time domain. You can't predict token N+2 without having first predicted token N+1. The human brain has the same operational principles.