| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by harlanlewis 759 days ago

I found myself making these types of comparisons over and over again to keep up with new models and pricing changes, so I made a Google Sheet for it with some charts to visualize outliers and trends in terms of model capability, throughput, and token cost. It's sorted by the Artificial Analysis Index [1], which incorporates both static benchmarks (MMLU) and dynamic crowdsourced elo (chatbot arena). Hope someone finds it useful:

https://docs.google.com/spreadsheets/d/1foc98Jtbi0-GUsNySddv...

Specific to Groq - the service is an incredible outlier for speed and cost, but even for quite small projects I've run into rate limits and similar errors. Great option for something like a bring-your-own-keys personal chatbot, but I wouldn't build a service on it. Groq Llama 3 70B and Groq Mixtral 8x7B are in the sheet for comparison but omitted from charts.

[1]: https://artificialanalysis.ai/leaderboards/providers

1 comments

nycdatasci 759 days ago

This is great - thanks!

link