Hacker News new | ask | show | jobs
by yrnameer 1291 days ago
This is the thing. These AI models aren't that impressive in what they do if you understand it. What's impressive is the massive amount of data. One day the law will catch up too because what they are all producing is literally just a combination of a lot of little pieces of compressed versions of human-produced things. In effect it's some type of distributed plagiarism.
3 comments

Like pretty much all human work...
Thankfully most human work is generally not controlled and monetized by three madmen
True! But that’s a critique of capitalism, not AI.
Actually did not mean for that statement to be understood in reverse. Is that the opposite of Poe's law? :thinking_emoji:
It has long been experimentally shown that neural network do in fact generalise and do not just memorise the training samples. What we do not see here is the convergence of the empirical distribution to the ideal distribution, the data is too sparse, the dimensionality too high. The amount of data is undoubtably enormous but it is not so simple. Only years and years of research have lead to models that are capable of learning such enormous amounts of data, while we can also see steady improvements on fixed datasets which means we in facto do make real progress on quite a lot of fronts. More data-efficiency would be great but at least we do have those datasets for language-related tasks, also it has been shown that fine-tuning is working quite well which might be a way to escape the dreaded data-inefficiency of our learning models.

In the end, we are not really in the business of copying the brain but creating models that learn from data. If we arrive at a model that can solve the problem we are interested in through different means than a human would, e.g. first pre-train on half of the internet and then fine tune on your taks, we would be quite happy and it would not be seen as a dealbreaker. Of course, we would really like to have models that learn faster or have more skills, but it's amazing what's possible right now. What I find inspiring is how simple the fundamental building blocks are that our models are composed of, from gradient descent to matrix multiplication to Relus (just a max(x,0)). It's not magic, just research.

> matrix multiplication to Relus (just a max(x,0))

Transformers famously employ the Softmax activation inside the attention matrix. Very rare to see Softmax anywhere other than the final layer.

This is an unfalsifiable claim.