Can you summarise how the model in your paper differs from this implementation of xLSTM ?
https://github.com/huggingface/transformers/issues/27011