|
|
|
|
|
by kherud
1035 days ago
|
|
Why is this the expected result? The original transformer algorithm has a n^2 computational complexity, where n is the amount of tokens. As far as I know, there are some improvements which bring it down to something like n*log(n). A linear complexity seems surprising however. Is the reason that calculating the attention can be completely parallelized with decent hardware, so the response time stays linear? |
|
Disclaimer: I’m not an expert, this is just what I’ve picked up reading about the technology.