Hacker News new | ask | show | jobs
by s-macke 911 days ago
In fact, the performance differences between the models are so significant that even a micro benchmark demonstrates their capabilities.

For example, consider my analysis [0] based on observing the progression of Large Language Models (LLMs) in a single text adventure.

[0] https://github.com/s-macke/AdventureAI#evaluation-of-other-m...