|
|
|
|
|
by omeze
780 days ago
|
|
This is a really cool paper, reminds me of the simple exercise Karpathy goes through in his NN vid series with a bigram predictor. Looks like in practice there’s still some grounding issues when attempting to use them for instruction-tuned applications, but clever direction to explore! |
|