|
|
|
|
|
by dns_snek
89 days ago
|
|
And how is this comment relevant here? The abstract lists the digestible model names, and you can find the details in the supplementary text: > To evaluate user-facing production LLMs, we studied four proprietary models: OpenAI’s GPT-5 and GPT- 4o (80), Google’s Gemini-1.5-Flash (81) and Anthropic’s Claude Sonnet 3.7 (82); and seven open-weight models: Meta’s Llama-3-8B-Instruct, Llama-4-Scout-17B-16E, and Llama-3.3-70B-Instruct-Turbo (83, 84); Mistral AI’s Mistral-7B-Instruct-v0.3 (85) and Mistral-Small-24B-Instruct-2501 (86); DeepSeek-V3 (87); and Qwen2.5-7B-Instruct-Turbo (88). edit: It looks like OP attached the wrong link to the paper! The article is about this Stanford study: https://www.science.org/doi/10.1126/science.aec8352 But the link in OP's post points to (what seems to be) a completely unrelated study. |
|