Hacker News new | ask | show | jobs
by gigatexal 60 days ago
Interesting. I would love your test but for code. If I were to forgo my claude subscription for a Chinese cloud hosted model or local models running on my own hardware I'd use them mostly for code.

the thing is I've tried to come up with a good test my own and spend countless time just tweaking it instead of saying this is good enough and benchmarking.