Y
Hacker News
new
|
ask
|
show
|
jobs
by
Translationaut
853 days ago
This seems only to work cause large GPTs have redundant, undercomplex attentions. See this issue in BertViz about attention in Llama:
https://github.com/jessevig/bertviz/issues/128