Hacker News new | ask | show | jobs
by kgeist 41 days ago
>Experiments at Cactus showed that MLPs can be completely dropped from transformer networks, as long as the model relies on external knowledge source.

Heh, what a coincidence, just today one of my students presented research results which also confirmed this. He removed MLP from Qwen and the model still could do transformation tasks on input but lost knowledge.

6 comments

How does that work? Don't you need knowledge to understand the meaning of the inputs?

Or is it the difference between, recognizing something vs recalling it being much more difficult? (Classification vs generation?)

> He removed MLP from Qwen and the model still could do transformation tasks on input but lost knowledge.

But not deterministic?

That sounds giant! Any unformatted unfiltered preliminary records of said findings?
Bullseye!
Sounds very interesting!
can knowledge then be queried via tool? :)
grep knowledge

I'm thinking more like some kind of local wiki with an inverted index. Has anyone tried that?

I know RAG isn't cool anymore and now we just do markdown files, but has anyone converted the useful parts of common crawl into .md ?