|
|
|
|
|
by aik
920 days ago
|
|
Do you have an example of where these methods still produce good summaries? Eg if you adjust how re-computation of self-attention in autoregressive decoding / between token generations works to significantly decrease the amount of computation needed? |
|