Hacker News new | ask | show | jobs
by dahcryn 103 days ago
I would like to counteract your statement that each token adds a distraction.

In our experiments, we see a surprising benefit to rewriting blocks to use more tokens, especially long lists etc..

E.g. compare these two options

"The following conditions are excluded from your contract - condition A - condition B ... - condition Z"

The next one works better for us:

"The following conditions are excluded from your contract - condition A is excluded - condition B is excluded ... - condition Z is excluded"

And we now have scripts to rewrite long documents like this, explicitly adding more tokens. Would you have any opinion on this?

1 comments

This observation makes sense, because all models currently probably use some kind of a sparse attention architecture.

So the closer the two related pieces of information are to each other in the input context, the larger the chance their relationship will be preserved.