|
|
|
|
|
by abetusk
1142 days ago
|
|
I'm not sure I understand it well enough to say but watching a video on it [0] I think there were a few key points: * "Attention is all you need" introduced positional encoding which allows you to keep context of the word, allowing for more complex translation (and thus generative/chatgpt like tasks?) because words now have context relative to each other. Contrast this with "bag of words" models that only tells you whether the word is present or not. * I don't quite understand why but transformers (which "AiaYN" introduced) can be made fully parallel, compared with the RNN/LSTM networks which has to be serial per token. Fully parallel allows for GPU optimization, which means you can take advantage of Moore's law for training. I'm always a bit suspicious when people claim a breakthrough of this sort. There's no doubt that better algorithms give better results but how much is due to just faster computers, cheaper compute, memory, etc. [0] https://youtu.be/S27pHKBEp30 |
|