| Even though it was told that it MUST quote users directly, it still outputs: > It’s already a game changer for many people. But to have so many names like o1, o3-mini, GPT-4o, & GPT-4o-mini suggests there may be too much focus on internal tech details rather than clear communication." (paraphrase based on multiple similar sentiments) It also hallucinates quotes. For example: > "I’m pretty sure 'o3-mini' works better for that purpose than 'GPT 4.1.3'." – TeMPOraL But that comment is not in the user TeMPOraL's comment history. Sentiment analysis is also faulty. For example: > "I’d bet most users just 50/50 it, which actually makes it more remarkable that there was a 56% selection rate." – jackbrookes
– This quip injects humor into an otherwise technical discussion about evaluation metrics. It's not a quip though. That comment was meant in earnest |
"The model naming all around is so confusing. Very difficult to tell what breakthrough innovations occurred." – patrickhogan1"