Y
Hacker News
new
|
ask
|
show
|
jobs
by
thomasahle
725 days ago
You can just do that my shaping the attention mask, no? That also gives you an actual guarantee that no information is leaked between conversations.
2 comments
suryabhupa
725 days ago
In practice, and at scale, that's exactly what having <bos> and <eos> tokens allow you to easily and programmatically do.
link
danielmarkbruce
724 days ago
You can't pack multiple examples into a single row of a matrix without knowing where one begins and one ends.
link