Hacker News new | ask | show | jobs
by zyl1n 1023 days ago
I got prefill: 26.9719 tokens/sec, decoding: 18.8827 tokens/sec on M1 Max 32GB laptop for llama 2 7b chat f32. Not bad.