> torch.compile improvements
so far 2.1 didn't work well with MoE GPT, at least in my implementation, due to dynamism in data flow. will check how 2.2 does