| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by godelski 821 days ago

> too many ideas that work well, even optimally, at small scale fail horribly at large scale.

Not that I disagree, but I don't think that's a reason to not publish. There's another way to rephrase what you've said

  many ideas that work well at small scales do not trivially work at large scales

But this is true for many works, even transformers. You don't just scale by turning up model parameters and data. You can, but generally more things are going on. So why hold these works back because of that? There may be nuggets in there that may be of value and people may learn how to scale them. Just because they don't scale (now or ever) doesn't mean they aren't of value (and let's be honest, if they don't scale, this is a real killer for the "scale is all you need" people)

> Other ideas that work at super specialized settings don’t transfer or don’t generalize.

It is also hard to tell if these are hyper-parameter settings. Not that I disagree with you, but it is hard to tell.

> Correlations in huge multimodal datasets are way more complicated than most humans can grasp and we will not get to AGI before we can have a large enough group of people dealing with such data routinely.

I'm not sure I understand your argument here. The people I know that work at scale often have the worst understanding of large data. Not understanding the differences between density in a normal distribution and a uniform. Thinking that LERPing in a normal yields representative data. Or cosine simularity and orthogonality. IME people that work at scale benefit from being able to throw compute at problems.

> we don’t do anybody a favor by increasing the entropy of the publications in the huge ML conferences

You and I have very different ideas as to what constitutes information gain. I would say a majority of people studying two models (LLMs and diffusion) results in lower gain, not more.

And as I've said above, I don't care about novelty. It's a meaningless term. (and I wish to god people would read the fucking conference reviewer guidelines as they constantly violate them when discussing novelty)

1 comments

pama 821 days ago

I think information gain will be easy to measure in principle with an AI in the near future: if the work is correct, how unexpected is it. Anything trivially predictable based on published literature, including exact reproduction disguised as novel is not worthy of too much attention. Anything that has a change of changing the model of the world is important. It can seem minor even trivial to some nasty reviewer, but if the effect is real and not demonstrated before then it deserves attention. Until then, we deal with imperfect humans.

Regarding large multimodal data, I don’t know what people you refer to, so I can’t comment further. The current math is useful but very limited when it comes to understanding the densities in such data; vectors are always orthogonal at high dim and densities are always sampled very poorly. The type of understanding of data that would help progress in drug and material design, say, is very different from the type of data that can help a chatbot code. Obviously the future AI should understand it all, but it may take interdisciplinary collaborations that best start at an early age and don’t fit the current academic system very well unfortunately.

link

godelski 820 days ago

> will be easy to measure in principle with an AI in the near future

I'd like to push back on this quite a bit. We don't have AI that shows decent reasoning capabilities. You can hope that this will be resolved, but I'd wager that this will just become more convoluted. A thing that acts like a human, even at an indistinguishable level need not also be human nor have the same capabilities of of a human[0]. This question WILL get harder to answer in the future, I'm certain of that, but we do need to be careful.

Getting to the main point, metrics are fucking hard. The curse of dimensionality isn't just that there are lots of numbers, it is that your nearest neighbor becomes ambiguous. It is that the difference between the furthest point (neighbor) and the closest point (nearest neighbor) decreases. It is that orthogonality becomes a more vague concept. That means may not be representative of a distribution. This is stuff that is incredibly complex and convolutes the nature of these measurements. For AI to be better than us, it would have to actually reason, because right now we __decide__ not to reason instead __decide__ to take the easy way out and act as if metrics are the same as they are in 2D (ignoring all advice from the mathematicians...).

It is not necessarily about the type of data when the issue we're facing is at an abstraction of any type of data. Categorically they share a lot of features. The current mindset in ML is "you don't need math" when the current wall we face is highly dependent on understanding these complex mathematics.

I think it is incredibly naive to just rely on AI solving our problems. How do we make AI to solve problems when we __won't__ even address the basic nature of problems themselves?

[0] As an example, think about an animatronic duck. It could be very lifelike and probably even fool a duck. In fact, we've seen pretty low quality ones fool animals, including just ones that are static and don't make sounds. Now imagine one that can fly and quack. But is it a duck? Can we do this without the robot being sentient? Certainly! Will it also fool humans? Almost surely! (No, I'm not suggesting birds aren't real. Just to clarify)

link

pama 820 days ago

An AI that can help referee papers to advance human knowledge doesn’t need to have lots of human qualities. I think it suffices if a) it has the ability to judge correctness precisely, and b) it expresses a degree of surprise (low log likelihood?) if the correct data does not fit its current worldview.

link

godelski 820 days ago

> it has the ability to judge correctness precisely,

That's not possible from a paper.

> it expresses a degree of surprise (low log likelihood?)

I think you're interpreting statistical terms too literally.

The truth of the matter is that we rely on a lot of trust from both reviewers and authors. This isn't a mechanical process. You can't just take metrics at face value[0]. The difficulty of peer review is the thing that AI systems are __the worst__ at and we have absolutely no idea how to resolve. It is about nuance. Anything short of nuance and we get metric hacking. And boy, you wanna see the degrade of academic works, the make the referee an automated system. No matter how complex that system is, I guarantee you human ingenuity will win and you'll just have metric hacking. We already see this in human led systems (like "peer review" and anyone that's ever had a job has experienced this).

I for one don't want to see science led by metric hacking.

Processes will always be noisy, and I'm not suggesting we can get a perfect system. But if we're unwilling to recognize the limitations of our systems and the governing dynamics of the tools that we build, then you're doomed to metric hack. It's a tale as old as time (literally). Now, if we create a sentient intelligence, well that's a completely different ball game but not what you were arguing either.

  You need to stop focusing on "making things work" and making sure they actually work. No measurement is perfectly aligned with ones goals. Anyone in ML that isn't intimately familiar with Goodhart's Law is simply an architect of Goodhart's Hell.

Especially if we are to discuss AGI, because there is no perfect way to measure and there never will be. It is a limitation in physics and mathematics. The story of the Jinni is about precisely this, but we've formalized it.

[0] This is the whole problem with SOTA. Some metrics no longer actually mean anything useful. I'll give an example, look at FID, the main metric for goodness of image generation. It's assumptions are poor (the norms aren't very normal and it's based on a ImageNet1k training which is extremely biased. And no, these aren't solved by just switching to CLIP-FID). There's been many papers written on this and similar for any given metric.

link