| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by albertzeyer 1232 days ago

(Partly copied from https://news.ycombinator.com/item?id=34640251.)

On models: Obviously, almost everything is Transformer nowadays (Attention is all you need paper). However, I think to get into the field, to get a good overview, you should also look a bit beyond the Transformer. E.g. RNNs/LSTMs are still a must learn, even though Transformers might be better in many tasks. And then all those memory-augmented models, e.g. Neural Turing Machine and follow-ups, are important too.

It also helps to know different architectures, such as just language models (GPT), attention-based encoder-decoder (e.g. original Transformer), but then also CTC, hybrid HMM-NN, transducers (RNN-T).

Some self-promotion: I think my Phd thesis does a good job on giving an overview on this: https://www-i6.informatik.rwth-aachen.de/publications/downlo...

Diffusion models is also another recent different kind of model.

Then, a separate topic is the training aspect. Most papers do supervised training, using cross entropy loss to the ground-truth target. However, there are many others:

There is CLIP to combine text and image modalities.

There is the whole field on unsupervised or self-supervised training methods. Language model training (next label prediction) is one example, but there are others.

And then there is the big field on reinforcement learning, which is probably also quite relevant for AGI.

2 comments

hardware2win 1232 days ago

I do wonder whether people behind Attention is all you need paper

Will receive Turing Award

It is being cited often

RC_ITR 1232 days ago

>Will receive Turing Award

This is the weird thing - hopefully not! Hopefully there's even better NN models coming out every 5-10 years and we look back on transformers as 'just a phase' sort of like how we look back at RNN's (which were no less of an amazing achievement - look at the proliferation of voice assistants), as potentially obsolete technology today.

Fore example, attention is great and does a really good job of simulating context in language, but what if we come up with a clever way to simulate symbology? Then we actually are back on the path to AGI and transformers will look like child's play.

Beldin 1232 days ago

> symbology

Off-topic, but now I have William Dafoe going "What's the 'symbology' here? The symbolism ..." in my head (from Boondock Saints).

Gee101 1232 days ago

Even thou I watched that movie 20 years ago. I will never forget that scene.

modeless 1232 days ago

The Adam optimizer is another possibility. It's unbelievably good and everyone uses it.

albertzeyer 1232 days ago

The authors did not really expect it to be such a huge influence. You could also argue, it is a somewhat natural next step. This paper did not invent self-attention nor attention. Attention was already very popular, specifically for machine translation, and a few other papers already did use self-attention at that point in time. It was just the first paper which solely used attention and self-attention and nothing else.

mirekrusin 1232 days ago

Guy who said - “I don’t understand all of this, can we just throw more machines?” should get the award.

seydor 1232 days ago

I remember an interview with one of the founders of openAI, saying that if it wasn't the transformer architecture it would be something else. What really matters is the scale of the model. The transformer is only one of the possible configurations that work well with text. It seems they stuck to it because it is really so good so why break things.

mattcaldwell 1232 days ago

Came here expecting a Haiku.

maxbond 1232 days ago

The authors who wrote

"Attention is all you need" -

Turing candidates?

fastball 1232 days ago

The people behind

"Attention is all you need"

Are often cited

andrelaszlo 1232 days ago

Attention.

Attention.

Attention.

- Ikkyū

qwertyforce 1232 days ago

Neural nets advance,

Attention is all you need,

Computing ascends.

#by chatgpt

PartiallyTyped 1231 days ago

Attention existed before that paper and was incorporated to LSTMs until that point in time.

alan-stark 1232 days ago

Thanks for sharing. Cool to see someone from Aachen NLP group. I'll be visiting Aachen/Düsseldorf/Heidelberg area in spring. Do you know of any local ML meetups open to general (ML engineer/programmer) public?

albertzeyer 1232 days ago

Unfortunately, not really. We used to have some RWTH internal meetups, although that has been somewhat interrupted since Corona, and not really recovered afterwards.

Aachen has quite a few companies with activity on NLP or speech recognition, mostly due to my professor Hermann Ney. E.g. there is Apple, Amazon, Nuance, eBay. And lesser-known AppTek. And in Cologne, you have DeepL. In all those companies, you find many people from our group. And then, at the RWTH Aachen University, you have our NLP/speech group, and also the computer vision group.

alan-stark 1231 days ago

Sounds like an "NLP valley" with Prof. Ney as Aachen's own Fred Terman :)