Hacker News new | ask | show | jobs
by rsweeney21 871 days ago
It's still strange to me to work in a field of computer science where we say things like "we're not exactly sure how these numbers (hyper parameters) affect the result, so just try a bunch of different values and see which one works best."
14 comments

> "we're not exactly sure how these numbers (hyper parameters) affect the result, so just try a bunch of different values and see which one works best."

Isn't it the same for anything that uses a Monte Carlo simulation to find a value? At times you'll end up on a local maxima (instead of the best/correct) answer, but it works.

We cannot solve something used a closed formula so we just do a billion (or whatever) random samplings and find what we're after.

I'm not saying it's the same for LLMs but "trying a bunch of different values and see which one works best" is something we do a lot.

I feel like it's the difference between something that has been engineered and something that has been discovered.

I feel like most of our industry up until now has been engineered.

LLMs were discovered.

LLMs were very much engineered... the exact results they yield are hard to determine since they're large statistical models, but I don't think that categorizes the LLMs themselves as a 'discovery' (like say Penicilin)
There’s an argument that all maths are discovered instead of invented or engineered. LLM hardware certainly is hard engineering but the numbers you put in it aren’t, once you have them; if you stumbled upon them by chance or they were revealed to you in your sleep it’d work just as well. (‘ollama run mixtral’ is good enough for a dream to me!)
If the Black Swan model of science is true, then most of the consequential innovations and advances are discovered rather than engineered.
I understand your distinction, I think, but I would say it is more engineering than ever. It's like the early days of the steam engine or firearms development. It's not a hard science, not formal analysis, it's engineering: tinkering, testing, experimenting, iterating.
> tinkering, testing, experimenting, iterating

But that describes science. http://imgur.com/1h3K2TT/

AI requires a lot of engineering. However, the engineering is not what makes working in AI interesting. It's the plumbing, basically.
I believe, from what I saw in Mathematics, this is a matter of taste. Discovered or invented are 2 perspectives. Some people prefer to think that light is reaching in previously dark corners of knowledge waiting to be discovered(discover). Others prefer to think that by force of genius they brought the thing into the world.

To me, personally, these are 2 sides of the coin, without one having more proof than the other.

and finally, this justifies the "science" in Computer Science.
That bottom-up tinkering is kinda how CS started in the US, as observed by Dijkstra himself: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD06xx/E...

Ideally we want theoretical foundations, but sometimes random explorations are necessary to tease out enough data to construct or validate theory.

This can be laid at the feet of Minsky and others who dismissed perceptrons because they couldn't model nonlinear functions. LLMs were never going to happen until modern CPUs and GPUs came along, but that doesn't mean we couldn't have a better theoretical foundation in place. We are years behind where we should be.

When I worked in the games industry in the 1990s, it was "common knowledge" that neural nets were a dead end at best and a con job at worst. Really a shame to lose so much time because a few senior authority figures warned everyone off. We need to make sure that doesn't happen this time.

What is the point you're trying to make?
What is the point you're trying to make?

Answering the GP's point regarding why deep learning textbooks, articles, and blog posts are full of sentences that begin with "We think..." and "We're not sure, but..." and "It appears that..."

What's yours?

This is what researching different Stable Diffusion settings is like. You quickly learn that there's a lot of guessing going on.
we have no theories of intelligence. We're like people in the 1500s trying to figure out why and how people get sick, with no concept of bacteria, germs, transmission, etc
I haven't seen this key/buzzword mentioned yet, so I think part of it is the fact that we're now working on complex systems. This was already true (a social network is a complex system), but now we have the impenetrability of a complex system within the scope of a single process. It's hard to figure out generalizable principles about this kind of thing!
Divine benevolence
I mean, it’s kind of in the name isn’t it? Computer science. Science is empirical, often poorly understood and even the best theories don’t fully explain all observations, especially when a field gets new tools to observe phenomena. It takes a while for a good theory to come along and make sense of everything in science and that seems like more or less exactly where we are today.
Not strange at all. This is largely how biology operates. These things are simpler than bio and more complex than programs
AI is more like gardening than engineering. You try things without knowing the outcome. And you wait a very long time to see the outcome.
Welcome to engineering. We don't sketch our controlled systems and forget all about systems theory. Instead we just fiddle with out controllers until the result is acceptable.
It's how God programs
it's a new paradigm