Hacker News new | ask | show | jobs
by space_fountain 1519 days ago
I'm by no means an expert, but a lot of choices machine learning algorithms make are more about training parallelization than anything. In many ways it feels like something like a recursive neural network or some architecture even more weird should be better for language, but in practice it's harder to train an architecture that demands each new output depend on the one before. Introducing dependencies on prier output typically kills parallelization. Obviously this is less of a problem for say a brain that has years of training time, but more of problem if you want to train one up in much less time using compute that can't do sequential things very quickly