| Finishing up my PhD thesis on low-resource audio classification for ecoacoustics. Our partners deployed 98 recorders in remote Arctic/sub-Arctic regions, collecting a massive (~19.5 years) dataset to monitor wildlife and human noise. Labeled data is the bottleneck, so my work focuses on getting good results with less data. Key parts: - Created EDANSA [1], the first public dataset of its kind from these areas, using a improved active learning method (ensemble disagreement) to efficiently find rare sounds. - Explored other low-resource ML: transfer learning, data valuation (using Shapley values), cross-modal learning (using satellite weather data to train audio models), and testing the reasoning abilities of MLLMs on audio (spoiler: they struggle!). Happy to discuss any part!
[1]https://scholar.google.com/citations?user=AH-sLEkAAAAJ&hl=en |
Just wondering if the raw data that you've mentioned are available publicly so we can test our techniques on them or they're only available through research collaborations. Either way very much interested on the potential use of our techniques for the polar research in Arctic and/or Antarctica.