Hacker News new | ask | show | jobs
by karmakaze 538 days ago
This is the interesting part of the experiment. Since these LLMs are general and not specifically trained on historical (and current) stock prices and (business) news stories, it isn't a measure of how good they could be today.
1 comments

My first through after seeing this post was that it's a real world eval. We are running out of evals lately (arc-agi test, then sudden jump on frontier math, etc). So it's good to have such real world tests which show how far we are.