| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Topfi 251 days ago

Gorilla is a great resource and it isn't unreasonable to suspect Z.AI has it in their data sets. I'd suspect most other frontier labs as well (pure speculation, but why not use it as a resource).

Problem is, while Gorilla was an amazing resource back in 2023 and continues to be a great dataset to lean on, but most ways we use LLMs in multi step tasks have since evolved greatly, not just with structured JSON (which GorillaOpenFunctionsV2, v4 eval does multi too), but more with the scaffolding around models (Claude Code vs Codex vs OpenCode, etc.). Likely why good performance with Gorilla doesn't necessarily map onto multiple step workloads with day-to-day tooling, which I tend to go for and reason why, despite there being FOSS options already, most labs either built their own coding assistant tooling (and most open source that too) or feel the need to fork others (Qwen with Geminis repo).

Purely speculative, but GLM-4.6 I evaluated using the same tasks as other models via Claude Code with their endpoint as that is what they advertise as the official way to use the model, same reason I use e.g. Codex for GPT-5. More focused on results in the best case, over e.g. using opencode for all models to give a more level playing field.