| Thanks for your comment. Please let me know what's not clear. Still no takers on my ML is CV comment below. Research on on-device ASR and computer vision is primarily driven by the same organizations that stand to benefit the most from it. It's nearly impossible to talk about machine learning today without talking about computer vision. Just look at the daily papers from Hugging Face or any other outlet. Machine learning is basically synonymous with computer vision and natural language processing is basically synonymous with information retrieval. Research is corporate research. With very few exceptions. Two popular lines of current ML research, multimodal and this latest neuro-symbolic re-hash, are not about furthering our understanding of what we currently can and cannot do. Not about doing science. They are about maintaining the status quo. They are about short-form video content and Google Knowledge Panels. Multimodal ML research is a vision-first enterprise. This doesn't make sense for a number of reasons. Here are two: the latent space of images and the representations therein cannot adequately capture the nuances and expressivity of language; language is a more fundamental cognitive process than vision. And what do you get for language in current ML research? How about a ``neuro-symbolic semantic parser'' that is neither a semantic parsers or symbolic. lol https://arxiv.org/abs/2402.00854v1 What's it good for? Computation graphs, domain adaptation, Google Knowledge Panels. This is a directed attack on machine learning research taking concepts that could be pursued with scientific merit, such as multimodal perception and neuro-symbolic parsing, but are instead turned into marketing hype and leveraged by the powers that be for the things that keep them in power. My audience is anyone participating in this research. Y Combinator... that's the mob of onewheels and URB-E scooters dodging human waste in the Tenderloin on their commute from Nob Hill to the Mission, right? Maybe you are referring to another audience. |
I think it would help if you link "NeSy computation engine". I'm actually not familiar with this (not in the symbolic world, but interested. Just never had time, so if you got links here I'd personally appreciate it). I can find the workshop but not the engine. Maybe bad google-fu
>> Domain adaptation across verticals
This is also a bit vague and so I'm not sure what you _specifically_ mean.
> ... onto the latent space of images gets you crude semantic attributes sometimes... These LVMs aren't going to cut it.
I'm with you here, but it is controversial and most people don't understand what a latent vector machine is or the alternatives. Remember people think you can easily measure distances with L1,L2, or cosine and that these metrics are well defined in R^{3x256x256} (or just fucking R^10). So they use t-SNE and UMAPs to look at latent spaces for smooth semantic properties. I think the problem is that math is taught by a game of telephone. Really all research is. Choices were made for historical reasons but when an assumption isn't removed after sufficient time it no longer becomes a well known assumption. I mean we can mention manifolds too. Or even probability distributions or i.i.d. Reminds me that I should update my lecture slides to make these things clearer lol.
But that said, I still think vectors can do a lot. Especially since vectors and functions are interchangeable representations. Though I think we need to do a lot more to ensure that networks are capable of learning things like equivariance and importantly abstract concepts. I don't see how current systems could calculate something like an ideal of a ring. But maybe someone has some formulation.
I'm also with you in the complexity aspect. I find it silly when a Sr Director is trying to convince people Sora is learning physics while showing videos where a glass empties its contents, then spills, and neither shatters nor plastically deforms but liquefies (https://twitter.com/DrJimFan/status/1758549500585808071). I'm not sure these people understand what a physics model is nor a world model since there is no coherence. I mean look, we're dealing with people who think the stacking example proves a world model but don't understand how failing on a simple counter example disproves such notions. You're right that there isn't enough subtly and care to understand how information leakage happens and how a lot of prompting techniques or followups give away the answer rather than tease one out.
> ML research isn't meant to further your understanding of anything. You can't separate it from corporate interest and land grabbing. It's the same thing re-hashed every year by the same people. NLP is pretty much a subfield of IR at this point.
I think a bit too exaggerated but hey, I've been known to say that ML research is captured by industry and we're railroading everything. And that it is silly we publish papers on GPT when we don't have the models in hand as it just becomes free work for OpenAI and we can't verify the works because OAI will change things. But I also don't know what you mean by "IR". I'm more on the CV side though, but like above, ehh like there's a big difference.
> Still no takers on my ML is CV comment below.
Honestly I don't know what you mean by this. But if you are saying that the divide we create like NLP vs CV is dumb, then I'm all with you. I also think it's silly how we call generative models. Aren't all models generative? Yann talking about JEPAs does not give me anything to go on. But then again, no one has a definition for generative model and it doesn't seem like anyone cares to. Well, at least one that would be consistent and include GANs, VAEs, NFs, Diffusion, and EMBs.
> My audience is anyone participating in this research.
That includes me, and even I have a hard time parsing what you're saying and it doesn't help with the side snipes like URB-E scooters. I have no idea what that even is. I definitely get the feeling of gaslighting and railroading. But I've just come to accept the fact that people drank the kool aid. I think people like Jim are true believers and really do believe that they are right. So it doesn't help to talk like this. You gotta meet them at their level. The scaling people will lose out and we're just gonna have to be patient. My take is if I'm wrong, so what, give Sam his $7T, we get AGI and we win. He's going to get his opportunity to scale no matter how much funding we can get into alternative views. But if I'm right and you need more than scale, then we better keep working because I'd rather not have another AI winter. I also think it is quite odd for these companies to not be hedging their bets a little and more strongly funding other avenues. Especially those that are not already the biggest of the biggest, because where you gonna get 500 racks of H100s to compete?
At this point, all I'm trying to get people around me in ML to understand is how nuance matters. That alone is a difficult battle. I'm just told to throw compute at a problem and data with no concern to the quality of that data. It does not matter how much proof I generate to show that a model is overfit, as long as the validation loss doesn't diverge, they don't believe me. ¯\_(ツ)_/¯