Hacker News new | ask | show | jobs
by A-Train 1059 days ago
The architecture is something like an ensemble but there is also this control network which chooses 2 experts to generate text.