Y
Hacker News
new
|
ask
|
show
|
jobs
by
layer8
17 days ago
It’s nonzero, because they carry state while performing inference, and in the surrounding processes like chain-of-thought and mixture-of-experts.
1 comments
knollimar
17 days ago
I think they have working memory but not short term memory. I suppose that's pedantic or anthropomorphizing but it feels like I felt tbh
link