Hacker News new | ask | show | jobs
by unbanned 1653 days ago
Why 768? What are these dimensions
1 comments

It comes from the dimensionality of the hidden state of popular NLP models. The number 768 in particular comes from (if I recall correctly) the largest of the original BERT models.
So it's arbitrary then