Hacker News new | ask | show | jobs
by formalsystem 596 days ago
Please ignore my previous comments - I double checked with the model developers and here's the correction. Vanilla PTQ means no fancy quantization algorithm like SpinQuant, AWQ, etc. was applied. It just applied the same quantization scheme mentioned in the post (4bit per-group with g_size=32 symmetric weight, 8bit dynamic per token activation).