|
|
|
|
|
by energy123
133 days ago
|
|
> this is where the taylor expression would fail to represent the values well. "In practice, we find that four Taylor terms (P = 4) suffice for recovering conventional attention with elementwise errors of approximately the same magnitude as Float16 resolution" |
|