| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aquafox 258 days ago
	Having gone through the explainations of the Transformer Explainer [1], I now have a good intuition for GPT-2. Is there a resource that gives intuition on what changes since then improve things like more conceptually approaching a problem, being better at coding, suggesting next steps if wanted etc? I have a feeling this is a result of more than just increasing transformer blocks, heads, and embedding dimension. [1] https://poloclub.github.io/transformer-explainer/

1 comments

ACCount37 258 days ago

Most improvements like this don't come from the architecture itself, scale aside. It comes down to training, which is a hair away from being black magic.

The exceptions are improvements in context length and inference efficiency, as well as modality support. Those are architectural. But behavioral changes are almost always down to: scale, pretraining data, SFT, RLHF, RLVR.

link