Coqui (https://github.com/coqui-ai) is a great open-source STT resource you could start with. They have a lot of docs explaining how everything works and has a low barrier to entry.
It is a python wrapper for a library for voice activity detection. It acts as a starting point while working on speech recognition problems. Helped me understand and discover a lot of concepts related to audio signal and data when I was in your shoes.