Hacker News new | ask | show | jobs
by smallerize 473 days ago
From https://huggingface.co/Qwen/QwQ-32B

Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, potentially impacting performance on shorter texts. We advise adding the rope_scaling configuration only when processing long contexts is required.

1 comments

Sorry, could you please explain what this means? I'm not into machine learning, so I don't get the jargon.
Well I can't be positive, but it looks like some of the factors that support a long context length might be set wrong. https://blog.eleuther.ai/yarn/