Hacker News new | ask | show | jobs
by laingc 1653 days ago
It comes from the dimensionality of the hidden state of popular NLP models. The number 768 in particular comes from (if I recall correctly) the largest of the original BERT models.
1 comments

So it's arbitrary then