|
|
|
|
|
by sjg007
2104 days ago
|
|
When I realized that what transformers do is transform input into output which is also input I was amazed but it makes sense. It’s exactly like a markov chain. Think of a snake eating itself. What’s important is that the output is basically a Probability distribution. Now you can post process that output to get a finite value but you really want to put it back in and turn the wheel again. But you are right, they are trained on next word prediction so there’s no long term memory. I imagine people are working on transformers with a memory bank. But RNNs seem to be the brute force solution here... what I am guessing is that you need to maintain some kind of index to decide where to backprop. If it hasn’t been discovered yet, I bet it will be some kind of bloom filter. |
|