Hacker News new | ask | show | jobs
by jazzyjackson 546 days ago
I'm returning my 96GB m2 max. It can run unquantized llama 3.3 70B but tokens per second is slow as molasses and still I couldn't find any use for it, just kept going back to perplexity when I actually needed to find an answer to something.
1 comments

Interesting. You're using the FP8 version i'm guessing? How many tokens/s are you using and which software? MLX?