Hacker News new | ask | show | jobs
by jehna1 667 days ago
At the moment I haven't found good ways of measuring the quality between different models. Please share if you have any ideas!

For small scripts I've found the output to be very similar between small local models and GPT-4o (judging by a human eye).