Hacker News new | ask | show | jobs
by archerx 467 days ago
That's why I like giving it a real world test. For example take a podcast transcription and ask it to make show notes and summary. With a temperature of 0 different models will tackle the problem in different ways and you can infer if they really understood the transcript. Usually the transcripts that I give it come from about 1 hour of audio of two or more people talking.
1 comments

Good test. I'm slowly accumulating private tests that I use to rate LLMs, and this one was missing... Thanks.