Hacker News new | ask | show | jobs
by korbip 772 days ago
This was formulated a bit unclear. It is not possible to parallelize in the sequence dimension for training as it is possible for Transformers. In the batch dimension you can always do it.