Hacker News new | ask | show | jobs
by Eisenstein 826 days ago
Can you please explain what that means? I'm not sure I get it.
1 comments

GP mentioned that the current slate of transformer based AIs are not transformative in the same way the Internet was. Rather it's more of a triumph of data engineering practices.

OP disagrees with GP. OP's main thesis is that AI enables a lot new applications. OP claims that GP is simply looking at it as if it were training data.

I stated that current AI techniques ARE indeed just reflections of the data used in training. I agree with GP that the current "AI"s are simply not transformative in the same way the Internet was.

If you change the training data for the current generation of AI, you get different behaviours. The training data forms a manifold - which you can think of as a landscape with features forming valleys and hills. What the current generation of AI does is that it tries to find a shape that fits the landscape - think of it like taking a very large sheet of cloth to cover a landscape. The stiffer the cloth, the less well the cloth fits to the landscape. The "stiffness" of the cloth is the amount of parameters that a neural network has. Modern deep nets are highly overparameterized - imagine a very soft pliable cloth - of course it fits to a landscape well.

So if you have a different training data - the neural network will fit to this different landscape as well. Hence the response will be different.

It's unfortunate that the training data is the entire internet for a few reasons:

1. Only the rich can train a vaguely competent AI. You're at the whims of those well-resourced enough. 2. There's no "alternate" training dataset anymore. (Though a clever thing people at OpenAI are doing are Mixture of Experts models, where you train multiple NNs using different subsets of the full training set, so you get multiple competencies)

But you are specifically talking about one type of AI, which is a generative language model. There are tons of other AIs with different applications that do not need to be trained on the entire internet. You have computer vision which separates in object recognition, classification, OCR, etc; you have audio which has text-to-speech (and reverse), music generation, and all sorts of other things; machine translation; sentiment analysis (I won't list all the categories in hugging face but you get my point). These are not differentiated merely by 'training data' to my understanding, so that's why your comment didn't make sense to me.

Calling all AI LLMs is like calling all of the internet the web. Of course if I am mistaken, corrections are welcome.

I agree. There are other types of AIs with different applications that do not need to be trained on the internet. The examples you have given however, are examples where the deep nets are extremely data hungry.

Take computer vision for example - a "hello world" version of object recognition would use ImageNet, which is 14 million hand annotated images. Or Cifar10 which is 80 million images. That of course but sets the stage for training data differentiation. Google's image recognition algorithm is far superior to other search engines'. Why? Because of Google's data set.

Any Tom Dick and Harry can go create their own image recognition AI and train it based on all the public datasets (COCO, CIFAR, ImageNet) but that's considered pretty baseline nowadays. The differentiator is what _other_ datasets you have.

Different datasets yield different results. It doesn't matter the network. More data is better (usually).

> But you are specifically talking about one type of AI, which is a generative language model.

...Because that's easily and widely understood to be what people mean in recent times when they're talking about "AI", referring to the stuff that's in the news, without further qualifiers.

If you want to talk about something more specific, you are going to need to be explicit about it, rather than expecting everyone else to understand what you've got in your head without actually saying it.

This is like saying "but "crypto" means so much more than just cryptocurrency! there's a whole cryptography field out there that does lots of good stuff!" It's true, but it's not helpful, because it's ignoring the obvious (at least to the other participants in the discussion) context. In this particular case, the context should be even more obvious because it's so clear that's what the article is talking about.

I thought we were on a site where people were knowledgeable and precise about the technical subjects being discussed.
It doesn't matter how knowledgeable and precise the people you're talking with are; you still need to communicate clearly about what you're actually talking about.
I disagree with your take here. While LLMs also enable significant functionality, we of all categories of service providers should be clearer when we are referencing the specificity of the LLM fad or the adoption of AI to enable services generally, which is the vision that drives the excitement behind the LLM fad.

When people read our comments in 5 years, they will read "AI" and have a much broader topical take than the present excitement about LLMs.