Hacker News new | ask | show | jobs
by lhl 632 days ago
BTW, I got a chance to read through the model card and there's a section that shows their SD gains: https://huggingface.co/amd/AMD-Llama-135m#speculative-decodi...

- 1.75x-2.80x on MI250

- 2.83x-2.98x on NPU

- 3.57x-3.88x on CPU

Note they were testing on AMD-Llama-135m-code as draft model for CodeLlama-7b, both of which do similarly badly on Humaneval Pass@1 (~30%), so it's likely if they were using a similarly trained 135m to SD for say, Qwen2.5-Coder (88.4% on HumanEval), the perf gains would probably be much worse.