On Hopper, FlexAttention is currently about 80% of FlashAttention3's performance (about 500 TFLOPs peak)