Hacker News new | ask | show | jobs
by valine 386 days ago
My personal theory is that it’s an emergent property of many attention heads working together. If each attention head is a bird, reasoning would be the movement of the flock.