Hacker News new | ask | show | jobs
by mhartz 979 days ago
Can someone help me understand Figure 2? Why does the newest token appear at the beginning of the sequence rather than next to its neighboring token?
1 comments

it's a rolling buffer, so it just upsert index % 4 in this case
Thanks, so does that mean position within the buffer is irrelevant?
it does feel like so, the position eventually loses its meaning as more and more data gets crunched by the training process, eventually it's just a context of the past 4 tokens it feels like