Y
Hacker News
new
|
ask
|
show
|
jobs
by
sharms
61 days ago
This is because the "thinking" you see is a summary by a highly quantized model - not the actual model, to mask these tokens