Anyways SwiTransformer paper looks interesting and doing a post training to optimize for it looks interesting as well.