|
|
|
|
|
by lhl
632 days ago
|
|
BTW, I got a chance to read through the model card and there's a section that shows their SD gains: https://huggingface.co/amd/AMD-Llama-135m#speculative-decodi... - 1.75x-2.80x on MI250 - 2.83x-2.98x on NPU - 3.57x-3.88x on CPU Note they were testing on AMD-Llama-135m-code as draft model for CodeLlama-7b, both of which do similarly badly on Humaneval Pass@1 (~30%), so it's likely if they were using a similarly trained 135m to SD for say, Qwen2.5-Coder (88.4% on HumanEval), the perf gains would probably be much worse. |
|