Y
Hacker News
new
|
ask
|
show
|
jobs
by
minimaxir
847 days ago
It is possible to create audio/speech embeddings using a model like CLAP:
https://huggingface.co/laion/larger_clap_music_and_speech
The results aren't good for nearest neighbor vector lookup, however.