To answer your question, while there are certain ops missing, the main limitation at the moment is for models with decoders... which are not very fast (yet) due to inefficient buffer reuse and many redundant copies between CPU and GPU. We're working closely with the ORT team to fix these issues though!