|
|
|
|
|
by godelski
845 days ago
|
|
I think you need to understand your audience better. Remember that most people have a very shallow understanding of DL systems. You got blog posts that talk about attention from a mathematical perspective but don't even hint at softmax tempering and just mention that there's a dot product. Or where ML researchers don't know things like why doubling batch size doesn't cut training in half or using fp16 doesn't cut memory in half. So I think toning down the language, being clearer and adding links will help you be more successful in communicating. |
|
Please let me know what's not clear. Still no takers on my ML is CV comment below.
Research on on-device ASR and computer vision is primarily driven by the same organizations that stand to benefit the most from it. It's nearly impossible to talk about machine learning today without talking about computer vision. Just look at the daily papers from Hugging Face or any other outlet. Machine learning is basically synonymous with computer vision and natural language processing is basically synonymous with information retrieval. Research is corporate research. With very few exceptions.
Two popular lines of current ML research, multimodal and this latest neuro-symbolic re-hash, are not about furthering our understanding of what we currently can and cannot do. Not about doing science. They are about maintaining the status quo. They are about short-form video content and Google Knowledge Panels.
Multimodal ML research is a vision-first enterprise. This doesn't make sense for a number of reasons. Here are two: the latent space of images and the representations therein cannot adequately capture the nuances and expressivity of language; language is a more fundamental cognitive process than vision.
And what do you get for language in current ML research? How about a ``neuro-symbolic semantic parser'' that is neither a semantic parsers or symbolic. lol https://arxiv.org/abs/2402.00854v1 What's it good for? Computation graphs, domain adaptation, Google Knowledge Panels.
This is a directed attack on machine learning research taking concepts that could be pursued with scientific merit, such as multimodal perception and neuro-symbolic parsing, but are instead turned into marketing hype and leveraged by the powers that be for the things that keep them in power. My audience is anyone participating in this research.
Y Combinator... that's the mob of onewheels and URB-E scooters dodging human waste in the Tenderloin on their commute from Nob Hill to the Mission, right? Maybe you are referring to another audience.