Y
Hacker News
new
|
ask
|
show
|
jobs
by
korbip
772 days ago
This was formulated a bit unclear. It is not possible to parallelize in the sequence dimension for training as it is possible for Transformers. In the batch dimension you can always do it.