It looks like their synchronous transcribe is much slower than whisper, but if you need it fast, you need their realtime ASR (or amazon or google's).
[0] Conformer-2 is trained on 1.1M hours of English https://www.assemblyai.com/blog/conformer-2/ [1] https://www.assemblyai.com/docs/Concepts/supported_languages