Hacker News new | ask | show | jobs
by Gracana 117 days ago
There's a reddit comment here https://www.reddit.com/r/LocalLLaMA/comments/1r4m4it/comment... that says:

my system is running GLM-5 MXFP4 at about 17 tok/s. That’s with a single RTX Pro 6000 on an EPYC 9455P with 12 channels of DDR5-6400. Only 16k context though, since it’s too slow to use for programming anyway and that’s the only application where I need big context.