Hacker News new | ask | show | jobs
by WiSaGaN 546 days ago
It still fails my private physics testing question half the time, where claude 3.5 sonnet and openai o1 (both web version) most of the time passes. So I'd say close to SOTA but not quite. However given deekseek already has the r1 lite preview, and they can achieve comparable performance for much less compute (assuming the API cost of close models roughly represent the inference cost), then it's not unreasonable to believe deepseek may be close to release very good test compute scaling model that is similar to o3 high effort.