| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by m-dot-reviews 3 days ago

So, this may not be precisely what you're looking for but it may come close. I've put together a simple site for sharing ratings/opinions on models on a task-specific granularity. https://model.reviews/

The idea is that benchmark score comparisons are useful for a large cross-product comparison across models + their settings, but less useful if you're looking for the best model for <your-specific-task>. So on this site, each model gets its own page showing the list of tasks that people have rated it on, and the score out of 10 for each task. Common tasks, like coding, will likely be on most/all models, and more niche tasks may only be on a few. It is human moderated (by me only right now).

The corpus is pretty empty right now, so please spread the word if this seems like a useful idea!