|
|
|
|
|
by zozbot234
272 days ago
|
|
AIUI, the thinking when developing transformers might have been that "reading text A vs. text B" just isn't parallel enough for truly large-scale training. The problem was to somehow also parallelize the learning of very long range dependencies within a single sequence, and transformers managed to do that. |
|