| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sdenton4 815 days ago
	In the area in working in (bioacoustics), embeddings from supervised learning are still consistently beating self supervised transformer embeddings. The transformers win on held out training data (in-domain) but greatly underperform on novel data (generalization). I suspect that this is because we've actually got a much more complex supervised training task than average (10k classes, multilabel), leading to much better supervised embeddings, and rather more intense needs for generalization (new species, new microphones, new geographic areas) than 'yet more humans on the internet.'

2 comments

PaulHoule 815 days ago

In text analysis people usually get better results in many-shot scenarios (supervised training on data) vs zero-shot (give a prompt) and the various one-shot and few-shot approaches.

link

tkulim 815 days ago

Hey, that is a field that I am interested in (mostly inspired by a recent museum exhibition). Do you have recent papers on this topic, or labs/researchers to follow?

link

sdenton4 815 days ago

It's a really fun area to work in, but beware that it's very easy to underestimate the complexity. And also very easy to do things which look helpful but actually are not (eg, improving classification on xeno canto, but degrading performance on real soundscapes).

Here's some recent-ish work: https://www.nature.com/articles/s41598-023-49989-z

We also run a yearly kaggle competition on birdsong recognition, called birdclef. Should be launching this year's edition this week, in fact!

Here's this year's competition, which will be a dead link for now: https://www.kaggle.com/competitions/birdclef-2024

And last year's: https://www.kaggle.com/competitions/birdclef-2023

link