| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by devashish86 3 days ago

Author here. Quick context the post doesn't quite spell out:

The tool_choice="auto" failure on Qwen3-Next isn't a parser issue — the model reasons inside <think>, decides, and never emits the tool call. No error, just empty tool_calls. The fix was swapping the backbone from Thinking to Instruct, not tuning any parser flag.

The "load the bigger model first, size the smaller against actual residency" playbook generalizes to anything with shared CUDA framework overhead. The ~5 GiB framework floor shows up even at small gpu_memory_utilization values — plan against actuals, not targets.

2 comments

edg5000 6 hours ago

From the Codex system prompt (verbatim):

```

(...) - Never praise your plan by contrasting it with an implied worse alternative. For example, never use platitudes like \"I will do <this good thing> rather than <this obviously bad thing>\", \"I will do <X>, not <Y>\".

- Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query. (...)

```

It seems the OpenAI people added that first bullet to specifically address the tendency the model has, as seen in the parent comment. The goblin stuff coincidentally appears right after in the system prompt, so in included it as a bonus.

link

dofm 3 hours ago

FWIW the mere fact that the goblin stuff is necessary and is in the system prompt suggests to me that OpenAI's approach of training ridiculously large models that can do everything for everyone is hopelessly cooked.

Though I concede it is not that much different than straightening the tie of your most valuable employee before you unwisely put them in front of a client and saying "please don't tell them about the regressions they didn't notice and remember, they don't want things explained in allegories drawn from the Silmarillion".

link

barrkel 8 hours ago

Can you try and tune your Claude or whatever LLM you're using for your text to phrase things in plain English. Way less use of antithesis, at least. You can probably find a skill for it, if not get an LLM to write your own.

link