| HN Mirror

The opinion I formed during the first few months of GPT4 release was that the society of the mind hypothesis was being disproved by the "maximalist" approach some were undertaking in order to build a true AGI. Turned out composing many LLMs into a cognitive architecture where each one had a specific purpose (memory, planning, etc ...) wasn't scaling.

On the same note, I suggest the following: training a transformer by "slicing" it in group of layers and force it to emit/receive tokens at each of those group's boundaries. What I expect: using text rather than neural activations should lead to decreased performance.

This is something you can observe in our societies: intelligence doesn't compose, you just don't double a group's overall intelligence by doubling the number of members. At best you'll observe decreasing return, at worst intelligence will decrease.