Hacker News new | ask | show | jobs
by imbusy111 3320 days ago
The idea is cool, but the execution is lacking. My expectation is that if you turn the slider fully to one or the other side, you would get a perfectly clear representation of that instrument. In reality, you get this dirty synthetic sound no matter what and the result of blending them together is always "dirty" and they all sound similar in the end.

I understand that it is hard to encode the sound into these parameters and get near perfect decoding, but maybe that's the next step?

2 comments

I think they just compressed too much due to computational constraints. They say that themselves. However, there is always a question of rate control in these autoencoder methods, and also the error function. In the original paper they don't seem to use a very good perceptual error function.
I had the same feeling. For instance, the sitar sound is not comparable to a sampled one. I think this idea makes sense for real synth sounds and not for imitations of already existing instruments.