Hacker News new | ask | show | jobs
by muaytimbo 672 days ago
it says the track embedding vectors are inputs, the music representations are probably learned in an earlier model, w2v or a two tower model.