Hacker News new | ask | show | jobs
by thomasahle 725 days ago
You can just do that my shaping the attention mask, no? That also gives you an actual guarantee that no information is leaked between conversations.
2 comments

In practice, and at scale, that's exactly what having <bos> and <eos> tokens allow you to easily and programmatically do.
You can't pack multiple examples into a single row of a matrix without knowing where one begins and one ends.