Hacker News new | ask | show | jobs
by albertzeyer 1232 days ago
(Partly copied from https://news.ycombinator.com/item?id=34640251.)

On models: Obviously, almost everything is Transformer nowadays (Attention is all you need paper). However, I think to get into the field, to get a good overview, you should also look a bit beyond the Transformer. E.g. RNNs/LSTMs are still a must learn, even though Transformers might be better in many tasks. And then all those memory-augmented models, e.g. Neural Turing Machine and follow-ups, are important too.

It also helps to know different architectures, such as just language models (GPT), attention-based encoder-decoder (e.g. original Transformer), but then also CTC, hybrid HMM-NN, transducers (RNN-T).

Some self-promotion: I think my Phd thesis does a good job on giving an overview on this: https://www-i6.informatik.rwth-aachen.de/publications/downlo...

Diffusion models is also another recent different kind of model.

Then, a separate topic is the training aspect. Most papers do supervised training, using cross entropy loss to the ground-truth target. However, there are many others:

There is CLIP to combine text and image modalities.

There is the whole field on unsupervised or self-supervised training methods. Language model training (next label prediction) is one example, but there are others.

And then there is the big field on reinforcement learning, which is probably also quite relevant for AGI.

2 comments

I do wonder whether people behind Attention is all you need paper

Will receive Turing Award

It is being cited often

>Will receive Turing Award

This is the weird thing - hopefully not! Hopefully there's even better NN models coming out every 5-10 years and we look back on transformers as 'just a phase' sort of like how we look back at RNN's (which were no less of an amazing achievement - look at the proliferation of voice assistants), as potentially obsolete technology today.

Fore example, attention is great and does a really good job of simulating context in language, but what if we come up with a clever way to simulate symbology? Then we actually are back on the path to AGI and transformers will look like child's play.

> symbology

Off-topic, but now I have William Dafoe going "What's the 'symbology' here? The symbolism ..." in my head (from Boondock Saints).

Even thou I watched that movie 20 years ago. I will never forget that scene.
The Adam optimizer is another possibility. It's unbelievably good and everyone uses it.
The authors did not really expect it to be such a huge influence. You could also argue, it is a somewhat natural next step. This paper did not invent self-attention nor attention. Attention was already very popular, specifically for machine translation, and a few other papers already did use self-attention at that point in time. It was just the first paper which solely used attention and self-attention and nothing else.
Guy who said - “I don’t understand all of this, can we just throw more machines?” should get the award.
I remember an interview with one of the founders of openAI, saying that if it wasn't the transformer architecture it would be something else. What really matters is the scale of the model. The transformer is only one of the possible configurations that work well with text. It seems they stuck to it because it is really so good so why break things.
Came here expecting a Haiku.
The authors who wrote

"Attention is all you need" -

Turing candidates?

The people behind

"Attention is all you need"

Are often cited

Attention.

Attention.

Attention.

- Ikkyū

Neural nets advance,

Attention is all you need,

Computing ascends.

#by chatgpt

Attention existed before that paper and was incorporated to LSTMs until that point in time.
Thanks for sharing. Cool to see someone from Aachen NLP group. I'll be visiting Aachen/Düsseldorf/Heidelberg area in spring. Do you know of any local ML meetups open to general (ML engineer/programmer) public?
Unfortunately, not really. We used to have some RWTH internal meetups, although that has been somewhat interrupted since Corona, and not really recovered afterwards.

Aachen has quite a few companies with activity on NLP or speech recognition, mostly due to my professor Hermann Ney. E.g. there is Apple, Amazon, Nuance, eBay. And lesser-known AppTek. And in Cologne, you have DeepL. In all those companies, you find many people from our group. And then, at the RWTH Aachen University, you have our NLP/speech group, and also the computer vision group.

Sounds like an "NLP valley" with Prof. Ney as Aachen's own Fred Terman :)