|
|
|
|
|
by smallerize
473 days ago
|
|
From https://huggingface.co/Qwen/QwQ-32B Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, potentially impacting performance on shorter texts. We advise adding the rope_scaling configuration only when processing long contexts is required. |
|