Hacker News new | ask | show | jobs
by amoshaviv 81 days ago
I wanted to make sure "thinking" and "planning" features are not being tested in this comparison, but I definitely tested "simply phrased" tasks as well: https://www.flowtester.ai/shared/ce1c8ef9-f387-48be-93f0-938...