Hacker News new | ask | show | jobs
by omeze 780 days ago
This is a really cool paper, reminds me of the simple exercise Karpathy goes through in his NN vid series with a bigram predictor. Looks like in practice there’s still some grounding issues when attempting to use them for instruction-tuned applications, but clever direction to explore!