|
|
|
|
|
by kgeist
189 days ago
|
|
>the GPT-OSS models are also quite good I recently pitted gpt-oss 120b against Qwen3-Next 80b on a lot of internal benchmarks (for production use), and for me, gpt-oss was slightly slower (vLLM, both fit in VRAM), much worse at multilingual tasks (33 languages evaluated), and had worse instruction following (e.g., Qwen3-Next was able to reuse the same prompts I used for Gemma3 perfectly, while gpt-oss struggled and RAG benchmarks suddenly went from 90% to 60% without additional prompt engineering). And that's with Qwen3-Next being a random unofficial 4-bit quant (compared to gpt-oss having native support) + I had to disable multi-token prediction in Qwen3-Next because vLLM crashed with it. Has someone here tried both gpt-oss 120b and Qwen3-Next 80b? Maybe I was doing something wrong because I've seen a lot of people praise gpt-oss. |
|
> We trained the models on a mostly English, text-only dataset, with a focus on STEM, coding, and general knowledge.
https://openai.com/index/introducing-gpt-oss/