Hacker News new | ask | show | jobs
by com2kid 1211 days ago
> "The general hypothesis of the project is: Consciousness, or something resembling consciousness, emerges not from the capability of a single task model like GPT or Stable Diffusion, but from the oscillations between the inputs and outputs of different instances of different models performing different tasks."

This is the underlying theory of classical liberal education, stemming back thousands of years.

We learn different ways of thinking, different lens through which we view the world, and we can apply those lens as needed to solve different problems.

Indeed when conversing with someone who has over-indexed on just one type of learning, we take notice, we say that person's worldview is limited. (For example, an engineer trying to sell a new product, but who doesn't understand that people aren't willing to toss away all their old skills for what is an incremental improvement in workflow, they should take a few courses in psychology! :) )

Take any famous work of architecture. An engineer can appreciate it for the eloquence of its construction, an artist can appreciate its beauty, the shapes, the shading, colors, textures. A historian can appreciate how it incorporates elements of the region's history and cultures.

Someone trained in all three (as anyone who graduated from a good university should have been, to at least some extent) will be to switch between modalities of thought at will, and also integrate those modalities together, and thus hopefully, derive more pleasure from their experiences of the world.

Of course AIs will need to have multiple models!

1 comments

This becomes a semantic debate if we do not define the boundaries between models. If models are "integrated" to an extreme, then they are really just the same model. ...the tradeoff of having one model vs two models is often driven by resources used to hold and serve content from a model, but there are also mathematical constraints as, for example, the size of the model grows in proportion to the quadratic of input data, which means that separate models which can communicate with one another are more efficient.

...but the trick is defining that inter-model communication and establishing a "controller" model with appropriate training data.