Hacker News new | ask | show | jobs
by ursaguild 558 days ago
I like the idea of more comparisons of models. Are there plans to add independent analyses of these models or is it only an aggregation of input limits?

How do you see this differing from or adding to other analyses such as:

https://artificialanalysis.ai

https://huggingface.co/spaces/TTS-AGI/TTS-Arena

https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

https://huggingface.co/spaces/TIGER-Lab/GenAI-Arena

Great work on all the aggregation. The website is nice to navigate.

3 comments

I made https://aimodelreview.com/ to compare the outputs of LLMs over a variety of prompts and categories, allowing a side by side comparison between them. I ran each prompt 4 times for different temperature values and that's available as a toggle.

I was going to add reviews on each model but ran out of steam. Some users have messaged me saying the comparisons are still helpful to them in getting a sense of how different models respond to the same prompt and how temperature affects the same models output on the same prompt.

Hey, this is pretty insightful! Wonder if, in the course of researching to build this website you reached any conclusions as to what’s the AI assistant currently ahead.
I can confirm, it's still very helful, thank you!
the gradio ui looks ugly imo, that's why I used shadcn and next.js to make the website look good.

I'll try to make it as user-friendly as possible. Most of the websites are ugly + too technical.

I want to point out you dodged the data question, and there's a reason for it.

I like your work visually on first glance, god knows you're right about gradio, even if its irrelevant.

But peddling extremely limited, out of date, versions of other people's data, trumps that, especially with this tagline. "A website to compare every AI model: LLMs, TTSs, STTs"

It is a handful of LLMs, then one TTS model, then one STT model, both with 0 data. And it's worth pointing out, since this endeavor is motivated by design trumping all: all the columns are for LLM data.

now imagine going one step further and actually running a prompt across every AI model and showing you the best answer and the AI model that generated it
Who decides what the best answer is?
the user who runs the prompt?
Those tools exist, they do not need to be imagined. Look into the related comments. Also they do little, but increase the labor of getting an answer. Not exactly an improvement of AI for the user to spend more time reviewing AI answers.