Hacker News new | ask | show | jobs
by rewq4321 1132 days ago
> Also the attention mechanism is baked in during pretraining

IIUC, this is no longer necessarily true with positional encodings like ALiBi: https://github.com/ofirpress/attention_with_linear_biases