Y
Hacker News
new
|
ask
|
show
|
jobs
by
visarga
694 days ago
GPTs also get gradients from all tokens, BERT only on 15% masked tokens. GPTs are more effective.