That sounds promising. Maybe scale (of model and training samples) is all you need.
And RNNs are obviously so much more efficient at inference.
I'm going to take a closer look :-)