Hacker News new | ask | show | jobs
by danielhanchen 115 days ago
Unsure but yes most likely they use YaRN, and maybe trained a bit more on long context maybe (or not)