| I went through the code a little while ago when I first saw this on HN. My thoughts: It really doesn't scale. It is designed for running on a single instance of physical hardware, RPi. In it's current guise it is really difficult to separate out, particularly, the TTS part which is tied to outputting to a speaker. There is also no way to process different parts on different machines, wrap up the answer and post it back to a client device, for example a phone. The detection of command is really just regular expressions, simple enough for a lot of people to work with
but is obviously inflexible, the priority setting is useful as long as you have everything in the right order (imagine ordering 20 or more modules). What is nice is that processing component could be very easily replaced with one that does some NLP, and fortunately Python (which it is writen in) does have some excellent libraries for that (NLTK). This means you could more naturally detect 'weather <city>' or 'forecast <country>' with some entity matching. The other nice bit is they have done a considerable amount of work abstracting STT and TTS libraries. By itself this would be pretty useful. Inline documentation is excellent. |
When we first designed Jasper, it was just for us to hack around with, so regex matching, the priority system, the single-instance configuration, etc.--these all made a lot of sense for our use case (and the use cases that we foresaw w/r/t casual hackers). Our goal was just to make things simple and accessible (hence our focus on documentation). Since our initial release, Jan Holthuis has taken over much of the development, and he's put a big emphasis on abstracting out the STT and TSS libraries (as you mentioned) and improving the design more generally. My hope is that Jasper will continue to grow and mature, and that the suggestions and possibilities you mention become realities.