Here is a cool demonstration to do voice-to-instrument or instrument-to-another instrument
(The inconvenient thing is that for a new kind of output sound you have to train a model for around 1 hour for good quality, but after that you can use it with different inputs quickly):