Y
Hacker News
new
|
ask
|
show
|
jobs
by
m101
637 days ago
Perhaps the smaller model used in o1 is over trained on arxiv and code relative to 4o (or undertrained on legal text)