Hacker News new | ask | show | jobs
by SomaticPirate 53 days ago
This seems to be testing the models on leetcode style prompts that also require the model to implement TCP calls to send the results. Interesting but probably not a apples to apples comparison. The fact only Grok qualified for the first one seems suspect