Hacker News new | ask | show | jobs
by abetusk 1142 days ago
I'm not sure I understand it well enough to say but watching a video on it [0] I think there were a few key points:

* "Attention is all you need" introduced positional encoding which allows you to keep context of the word, allowing for more complex translation (and thus generative/chatgpt like tasks?) because words now have context relative to each other. Contrast this with "bag of words" models that only tells you whether the word is present or not.

* I don't quite understand why but transformers (which "AiaYN" introduced) can be made fully parallel, compared with the RNN/LSTM networks which has to be serial per token. Fully parallel allows for GPU optimization, which means you can take advantage of Moore's law for training.

I'm always a bit suspicious when people claim a breakthrough of this sort. There's no doubt that better algorithms give better results but how much is due to just faster computers, cheaper compute, memory, etc.

[0] https://youtu.be/S27pHKBEp30