Hacker News new | ask | show | jobs
by sharms 61 days ago
This is because the "thinking" you see is a summary by a highly quantized model - not the actual model, to mask these tokens