Hacker News new | ask | show | jobs
by m101 637 days ago
Perhaps the smaller model used in o1 is over trained on arxiv and code relative to 4o (or undertrained on legal text)