| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ursaguild 558 days ago

I like the idea of more comparisons of models. Are there plans to add independent analyses of these models or is it only an aggregation of input limits?

How do you see this differing from or adding to other analyses such as:

https://artificialanalysis.ai

https://huggingface.co/spaces/TTS-AGI/TTS-Arena

https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

https://huggingface.co/spaces/TIGER-Lab/GenAI-Arena

Great work on all the aggregation. The website is nice to navigate.

3 comments

botro 558 days ago

I made https://aimodelreview.com/ to compare the outputs of LLMs over a variety of prompts and categories, allowing a side by side comparison between them. I ran each prompt 4 times for different temperature values and that's available as a toggle.

I was going to add reviews on each model but ran out of steam. Some users have messaged me saying the comparisons are still helpful to them in getting a sense of how different models respond to the same prompt and how temperature affects the same models output on the same prompt.

link

adrianomartins 557 days ago

Hey, this is pretty insightful! Wonder if, in the course of researching to build this website you reached any conclusions as to what’s the AI assistant currently ahead.

link

rtsil 558 days ago

I can confirm, it's still very helful, thank you!

link

ahmetd 558 days ago

the gradio ui looks ugly imo, that's why I used shadcn and next.js to make the website look good.

I'll try to make it as user-friendly as possible. Most of the websites are ugly + too technical.

link

refulgentis 558 days ago

I want to point out you dodged the data question, and there's a reason for it.

I like your work visually on first glance, god knows you're right about gradio, even if its irrelevant.

But peddling extremely limited, out of date, versions of other people's data, trumps that, especially with this tagline. "A website to compare every AI model: LLMs, TTSs, STTs"

It is a handful of LLMs, then one TTS model, then one STT model, both with 0 data. And it's worth pointing out, since this endeavor is motivated by design trumping all: all the columns are for LLM data.

link

vivzkestrel 558 days ago

now imagine going one step further and actually running a prompt across every AI model and showing you the best answer and the AI model that generated it

link

alternatex 557 days ago

Who decides what the best answer is?

link

vivzkestrel 557 days ago

the user who runs the prompt?

link

alternatex 553 days ago

Those tools exist, they do not need to be imagined. Look into the related comments. Also they do little, but increase the labor of getting an answer. Not exactly an improvement of AI for the user to spend more time reviewing AI answers.

link