|
|
|
|
|
by mnicky
109 days ago
|
|
This observation makes sense, because all models currently probably use some kind of a sparse attention architecture. So the closer the two related pieces of information are to each other in the input context, the larger the chance their relationship will be preserved. |
|