Y
Hacker News
new
|
ask
|
show
|
jobs
by
laingc
1653 days ago
It comes from the dimensionality of the hidden state of popular NLP models. The number 768 in particular comes from (if I recall correctly) the largest of the original BERT models.
1 comments
unbanned
1653 days ago
So it's arbitrary then
link