Irrespective of the subject Deepspeech is very old archtecture with suboptimal results. You'd better try any recent conformer implementations (flashlight, nemo, wenet, etc) or wav2vec.
I ended up with DeepSpeech since it was very easy to get started with, and it has support for fairly low-latency inferencing which is very important for my project.
I will take a look at the ones you suggested though, thanks for the heads-up!
I will take a look at the ones you suggested though, thanks for the heads-up!