| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tuned 148 days ago
	thanks for linking. Yes the paper compares the new architecture (that is also a fork of my implementation of nanoGPT) with Karpathy's nanoGPT. There are also links to the code and bench used.

1 comments

Herring 148 days ago

Note I didn't say Karpathy's nanoGPT, I said use the speedrun.

Transformers are universal function approximators. When well-tuned, they often start to approximate other innovations. Not always, thank god, but often enough that you have to be careful.

link

tuned 145 days ago

ok, thanks. I am taking it slow then

link