| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by two_in_one 873 days ago

> now supports FlashAttention-2, yielding around 2x speedups

> torch.compile improvements

so far 2.1 didn't work well with MoE GPT, at least in my implementation, due to dynamism in data flow. will check how 2.2 does