Hacker News new | ask | show | jobs
by DoctorOetker 1229 days ago
They also only optimize for bigram statistics (2 matrices), so they don't utilize the associative property of matrices A(BC)=(AB)C, corresponding to string concatenation...