Hacker News new | ask | show | jobs
by cyberbiosecure 836 days ago
for that answer of greatest validity you should e. g. check out "HumanEval" LLM benchmark on HuggingFace website. That is one of the best objective source of info on this issue. Currently Claude 3 is the best, far superior to other models. (85% correct code tasks done. ChatGPT 4 has 65% aprox. This is a giant difference (35% errors vs 15% , so Claude 3 is x2.5 better in terms of code quality losses)