Y
Hacker News
new
|
ask
|
show
|
jobs
by
highfrequency
1955 days ago
Is this the first time that the gradient clip threshold has been chosen relative to the size of the weight matrix?