Hacker News new | ask | show | jobs
by measured_step 946 days ago
How would this scale for a use case like writing code? I could imagine that some inputs would require a large number of neurons. Would this architecture be able to do that if it were scaled up?

I'm also curious if this model architecture would achieve the grokking of more complex concepts at scale.