| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by riku_iki 2318 days ago
	> BERT returns a tensor of dimension (n_tokens, hidden_size) [1]. Where hidden size has no relationship with the vocabulary "BERT returns" is ambiguous here. During pretraining last layer is loggits for one hot vocab vector, the same as in GPT: https://github.com/google-research/bert/blob/master/run_pret...