Hacker News new | ask | show | jobs
by minimaxir 847 days ago
It is possible to create audio/speech embeddings using a model like CLAP: https://huggingface.co/laion/larger_clap_music_and_speech

The results aren't good for nearest neighbor vector lookup, however.