Hacker News new | ask | show | jobs
by elexhobby 2641 days ago
Great post, thanks!

Is there a reason why the training is started off with two separate matrices - the embedding and the context matrix? If the context matrix is anyway discarded at the end, why not start and work with only the embedding matrix?