In my experience inputs to MTCNN tend to be full frames, so the uniform dimension requirement is usually met.