Hacker News new | ask | show | jobs
by overfeed 103 days ago
> I wonder what the underlying cause is

It responds with the statistically most probable text based on its training data, which happens to be different with the errors vs without. I suspect high-fidelity diagramming requires a different attention architecture from the common ones used in sentence-optimized models.