| >No, they generally do not compete on accuracy benchmarks afaik. "Get Answers" is literally at the top of ChatGPTs landing page. You think the average person interprets that to mean "Get inaccurate answers"? Google "AI benchmark" and almost every result is an assessment of the accuracy of various models. What do you think they compete on? How do you think they measure the improvement of one model to the next? Here's OpenAI's "Optimizing LLM Accuracy" https://platform.openai.com/docs/guides/optimizing-llm-accur... Pop this in Google and see the pages of results about accuracy: site:openai.com "accuracy". To claim that they don't optimize for accuracy confirms to me that you are not discussing this in good faith. Perhaps you are just trying to be contrarian or something, I don't know. >and also what led you to earlier claim that anyone typing in the complainant's name saw the same hallucination. Well, it says right in the article that different people received the same result. Why are the goalposts moving? Actually, nevermind, I don't care to continue the conversation. |
You'll see that AI companies, including openai, are generally not competing on accuracy benchmarks.
For example, here are the benchmarks on which open ai seem to be trying to compete.
MMLU: Measuring Massive Multitask Language Understanding,
MATH: Measuring Mathematical Problem Solving With the MATH Dataset,
GPQA: A Graduate-Level Google-Proof Q&A Benchmark,
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs,
MGSM: Multilingual Grade School Math Benchmark (MGSM), Language Models are Multilingual Chain-of-Thought Reasoners,
HumanEval: Evaluating Large Language Models Trained on Code,