| HN Mirror

Thanks for your reply.

>>> I think it would help if you link "NeSy computation engine". I'm actually not familiar with this (not in the symbolic world, but interested. Just never had time, so if you got links here I'd personally appreciate it). I can find the workshop but not the engine.

I linked to it in my previous comment. I'm referring to the ``NeSy computation engine'' described here. I didn't know there was a ``NeSy'' workshop and this paper was my first encounter with the term.

I think it's interesting that you mention symbolic world like it is separate from some other world. There's the AI that was and the AI that is today. There's the AI over there and the AI over here. Whenever you hear someone mention symbolic in the context of AI go ahead and grab a chair because immediately after this they are going to talk about cyc and John McCarthy for at least 20 min. If you're lucky they might throw some Prolog in there.

I don't think this is productive and I don't think there is another symbolic world. There is just the world. There are certain things in the world for which a numeric, directional representation makes sense. There are other things for which it makes no sense at all. It's my view that primitives in language are one of these things. Additionally, there are certain places where it makes sense to consider these representational approaches and other places where it only makes political sense. Lastly, there are symbols - atomic primitives - and there are ``symbols,'' objects with vectors in them and who knows what else.

What's striking to me about this paper is the coverage of formal grammars and semantic parsing entirely within the context of domain adaptation. Definitely the best part is the coverage of compositionality (https://ncatlab.org/nlab/show/compositionality) in the context of composing computational graphs. This is striking to me because all of these things (except domain adaptation) are essential to any reasonable theory of meaning but they are covered as if they've been repurposed for the practical application of populating Google Knowledge Panels, which I believe is exactly what happened. Check out the definitions of semantic parser and symbol.

>> Domain adaptation across verticals >>> This is also a bit vague and so I'm not sure what you _specifically_ mean.

Crude semantic attributes pulled from character sequences and mapped onto the latent space of images have utility in business contexts if the mapping for some term sufficiently distinguishes it from the mapping of another term that has the same surface form. It ends there. GloVe was a half-baked representation of meaning in language when it was adapted from word2vec in 2014. GPT-2 grabbed the torch in 2019. It still doesn't work. Well, it sometimes works for adapting a general model to a specific domain such as a business vertical, but only in a crude and superficial way. Note that almost no ML research today discusses this representational issue at all, and that almost all ML research takes this representation as a starting point. If you decide to publish hyperparameters in your paper, such as in an appendix, hyperparameters related to vocab size and the dimensionality of your embedding space often aren't even worth mentioning. That's fine, I guess, because they don't mean anything anyway, but not talking about this, in my view, is not fine.

Check out the Mamba paper for example. Like most of ML research today the focus is on optimization. The representation problem has been solved so there's no need to talk about it: we map everything onto the latent space of images because short-form video content rules the day and that's how dude is gonna hit his 7T: advertising ([link redacted]).

>>> I think the problem is that math is taught by a game of telephone.

I think that, for language, the ML research community is, by and large, not even using the right maths.

>>> But that said, I still think vectors can do a lot. Especially since vectors and functions are interchangeable representations. Though I think we need to do a lot more to ensure that networks are capable of learning things like equivariance and importantly abstract concepts.

Thank you so much for highlighting the important of equivariance. I think this is a crucial concept for work at the cross-modal interfaces, especially in the context of the Curry-Howard correspondence, or, more recently, the Curry-Howard-Lambek correspondence. Right now the ML (CV) research community is labeling nouns with bounding boxes... lol. If that doesn't illustrate the fact that multimodal work is a vision-first enterprise I don't know what will.

>>> I think a bit too exaggerated but hey, I've been known to say that ML research is captured by industry and we're railroading everything. And that it is silly we publish papers on GPT when we don't have the models in hand as it just becomes free work for OpenAI and we can't verify the works because OAI will change things.

Check out the evaluation criteria in that ``NeSy'' paper, especially the metric that's supposed to tell you something about what the system was designed to do. I'm sure OpenAI is happy to have this info about their system.

>>> But I also don't know what you mean by "IR".

Ten years ago I considered NLP adjacent to information retrieval. Today I consider it part of information retrieval. There's very little work published today that suggests otherwise.

>>> Honestly I don't know what you mean by this. But if you are saying that the divide we create like NLP vs CV is dumb, then I'm all with you.

It is not my intention at all to create or highlight any divide. If there is indeed a known divide between CV and NLP I don't know anything about it, I don't want to know anything about it and it's not surprising.

>>> I also think it's silly how we call generative models. Aren't all models generative?

Generative refers to a situation where you begin with a finite set of things and productively form any number of well-formed expressions from these things.

>>> That includes me, and even I have a hard time parsing what you're saying and it doesn't help with the side snipes like URB-E scooters.

I'll take potshots at the Paul Grahams and Steve Jobs of the world every day and not lose any sleep over it. If they take their AirPods out of their ears maybe they'll hear me coming.

>>> But if I'm right and you need more than scale, then we better keep working because I'd rather not have another AI winter.

All I have to say about scaling is that, for language, I hope it's clear by now that more data and more params is not going to improve the situation. I can see how this is almost never the case for vision.

Damn it somebody said AI winter again. You aren't going to start talking about cyc and McCarthy for 20 min now are you?

>>> I also think it is quite odd for these companies to not be hedging their bets a little and more strongly funding other avenues.

The formula works.

>>> At this point, all I'm trying to get people around me in ML to understand is how nuance matters. That alone is a difficult battle. I'm just told to throw compute at a problem and data with no concern to the quality of that data. It does not matter how much proof I generate to show that a model is overfit, as long as the validation loss doesn't diverge, they don't believe me.

I'm interested in learning more about what you mean by nuance.

Probably just needs more compute and data. Just throw some synthetic data in there and call it.