Hacker News new | ask | show | jobs
by charcircuit 1036 days ago
It's not beyond human understanding. Unless you mean that one must know everything from every research paper released. At its core you are just finding a well performing model using gradient descent. Gradient descent is not beyond human understanding.
2 comments

Gradient descent in isolation is obviously not what they are alluding to. What the models are doing inside the box and what any of those millions or billions of weights mean or do is beyond human understanding.
I don't think it is, as somebody who's spent maybe 100 combined hours reading AI papers mostly focused around NLP and image classification.

You have a dataset, symbolically represented in 1s and 0s. You have an objective function (e.g. classify the object as belonging to one of N categories).

The purpose of the collective neurons in the network is to "encode" the input space in a way that satisfies the objective function. In the same way that we "encode" higher-level concepts into shorthand representations.

Gradient descent is the optimization function we use to develop this encoding.

Beyond this, there are all kinds of tricks people have developed (interesting activation functions for neurons, grouping + segregating neurons, introducing a dimension of recurrence/time, dataset pre-processing, using bigger datasets, having another model generate data that's deliberately challenging for the first model) to try to converge to a more robust/accurate encoding, or to try to converge to a decent encoding at a faster rate.

There is no magic here at the lowest level – you can interrogate the math at each step and it'll make sense.

The "magic" is that we have zero epistemology to explain why tricks work, other than "look, ma test results". We know certain techniques work, and we have post-hoc intuitive explanations, but we're mostly fumbling our way "forwards" via trial and error.

This is "science" in the 17th century definition of the term, where we're mixing chemicals together and seeing what happens. Maybe we'll have a good theoretical explanation for our experimental results 100 years from now, if we're still around.

Nobody said anything about Magic.

>There is no magic here at the lowest level – you can interrogate the math at each step and it'll make sense.

See that's the thing. You can't unless "making sense" has lost all meaning.

That you can see a bunch of signals firing or matrices being multiplied does not mean they "make sense" or are meaningful to you. Lol level gibberish is still gibberish.

Our ability to divine the purpose of activations of anything but the extremely small scale is atrocious.

>Our ability to divine the purpose of activations of anything but the extremely small scale is atrocious.

The value of each parameter is chosen to minimize the loss. This applies to every single weight of the model. Not all weighs affect loss the same amount which is why concepts like pruning exist.

>The value of each parameter is chosen to minimize the loss

Vague and fairly useless. What is it doing to minimize loss ?

>Not all weighs affect loss the same amount which is why concepts like pruning exist.

Only weights with values close to or at zero get pruned. It's not because we know what each weight does and can tell what would work otherwise.

>Vague and fairly useless.

When creating a model your goal is to find one with minimal loss. Being able to figure how to improve a model by finding weights that reduce the loss is not a vague or useless idea.

>What is it doing to minimize loss?

The value helps us get to a location in the parameter space with lower loss.

>Only weights with values close to or at zero get pruned.

Weights near 0 don't change the results of the calculations they are used in my much which is why they don't effect loss very much.

Anyone satisfied with "it's gradient descent" as an explanation isn't displaying much curiosity.