| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throwa356262 5 days ago
	Are 0.6b models useful without fine tuning? Half of the times I ask qwen 0.6b "what is 1 + 2?" it ends up in a thinking loop of "but wait, the user is asking me to ..."

2 comments

rhdunn 4 days ago

If you don't want the thinking, you can pass `enable_thinking: false` to the `chat_template_kwargs`. If using promptfoo, this can be done via:

    providers:
      - # llama-server
        id: openai:chat:qwen
        config:
          apiBaseUrl: http://localhost:7876
          apiKey: "..."
          passthrough:
            chat_template_kwargs:
              enable_thinking: false

The looping may be due to quantization -- I've seen it on locally quantized Q6_K Qwen 3.5/3.6 models. I recall seeing somewhere (here or r/LocalLlama) that Qwen models are sensitive to quantization of the keys, though I haven't yet experimented with/looked into fixing this. (I've been building up my promptfoo tests/infrastructure to detect looping, etc. on Qwen and other models.)

link

kamranjon 4 days ago

A fun thing I do with Qwen 3.5 0.8b is to take a screenshot of the Hackernews homepage and ask it to give me a JSON representation of the data and it does surprisingly well. With a well structured prompt I think it could be made to be pretty reliable tool for that type of task out of the box.

link

Zambyte 4 days ago

While a fun poc, surely it would be better to just use the API (see the footer)? Or just `curl | x2j | jq` and map the HTML directly to JSON?

link

kamranjon 4 days ago

Yes apologies, Hackernews was just an example, you can do this with any website - it’s just a simple benchmark I like to use for testing vision models.

link