Hacker News new | ask | show | jobs
by jcims 1595 days ago
My initial guess is they have nothing to do with each other. It would be like explaining why the next idea pops in your head. You can create a rational explanation but there's no way to test it.
1 comments

my thoughts too, based on limited understanding of GPT. but the more pressure you apply towards compressing the neural network during training, the more circuitry these paths are likely to share. it would be interesting to see just how much and which parts could be folded together before you start to lose significant fidelity (though unfortunately the fidelity seems too low today to even try that).