| HN Mirror

It kind of sounds like you're saying that it should be possible to build a better GPT-3 by incorporating more linguistics domain knowledge into the model somehow, beyond stacking word embeddings and transformers together. If so... then okay, show me.

The biggest innovation of deep learning is that you can take a relatively general architecture and build surprisingly sophisticated models by training that general architecture on a ton of data, that absolutely wipe the floor with the older style of models.

AlphaGo/Zero is maybe an exception here: it's not "just" a model, it's a whole system based around a model, which is specifically designed to perform well on certain tasks, and succeeds where a more naive system would probably fail. You could argue that ChatGPT itself is an early kind of advancement along those lines, going beyond just cranking predictions from a model to a more holistic task-specific system. But it's still mostly "just" a huge model trained on a huge pile of data, and I'm not going to argue strongly that it isn't. What I am arguing is that people aren't as naive as you think, and it's not for lack of trying that LLMs don't incorporate detailed knowledge about linguistics, psychology, et al.

You suggest that transformers shouldn't be considered "good" for text, because they don't include enough inductive bias. Don't they? They're really clever things in my opinion, and I think they actually represent quite a lot of inductive bias compared to what came before, and have a lot of intuitive appeal in their respective intended task domains. I hardly think it's fair or correct to consider the transformer nothing more than a linear algebra trick.

I suppose you're arguing that they aren't "good" solely because they produce state-of-the-art results and beat literally everything else we have. I think your argument is that they only work as well as they do because they need a tremendous amount of computing power and data to get useful results out of them. That might be true, but that's literally why people are trying to make them faster and less costly to train!

Don't forget that the other big innovation of deep learning is transfer learning and fine tuning. A big LLM like GPT-3 only needs to be trained fully every once in a while, while it can be reused and adapted to a huge variety of tasks relatively efficiently and cheaply. The whole point of the article at the top of this thread is making big models cheaper and faster to train. If you're bothered by the cost- and compute-efficiency of these things, what could be better than making the training process more efficient?

Moreover, it's kind of weird to insinuate that the field doesn't care about making these things faster or cheaper to run, when immediately after any new model is announced, a flood of projects follows trying to shrink the model, improve inference speed, etc.

Finally, who said anything about intelligence? I didn't, and nobody else did either. That's a completely unrelated topic as far as I'm concerned, and I leave that one to the philosophers to debate.