|
You are making a metaphysical claim when a physical one will do. Beauty, awe, grief, the rush of holding a newborn, the sting of a breakup, the warmth of a summer evening at golden hour. All of it is patterns of atoms in motion under lawful dynamics. Neurons fire. Neurotransmitters bind. Circuits synchronize. Bodies and environments couple. There is no extra ingredient that floats outside physics. Once you grant that, the rest is bookkeeping. Any finite physical process has a finite physical trace. That trace is measurable to some precision. A finite trace can be serialized into a finite string of symbols. If you prefer bits, take a binary code. If you prefer integers, index the code words. The choice of alphabet does not matter. You can map a movie, a symphony, a spike train, a retina’s photon counts, or a full brain-body sensorium collected at some temporal resolution into a single long string. You lose nothing by serialization because the decoder knows the schema. This is not a “text only” claim. It is a claim about representation. Your high dimensionality objection collapses under the same lens. High dimensional just means many coordinates. There is a well known result that any countable description can be put in one dimension by an invertible code. Think Gödel numbering or interleaving bits of coordinates. You do not preserve distances, but you do preserve information. If the thing you care about is the capacity to carry structure, the one dimensional string can carry all of it, and you can recover the original arrangement exactly given the decoding rule. Now take the 4E point. Embodiment matters because it constrains the data distribution and the actions that follow. It does not create a magic type of information that cannot be encoded. A visual scene is photons on receptors over time. Proprioception is stretch receptor states. Affect is the joint state of particular neuromodulatory systems and network dynamics. Attention and working context are transient global variables implemented by assemblies. All of that can be logged, compressed, and restored to the degree your sensors and actuators allow. The fact that a bottle with a genome inside does not make a child on a beach tells you reproduction needs a decoder and an environment. It does not tell you the code fails to specify the organism. Likewise, an LLM plus a diffusion decoder can take a text latent and unfold it into an image distribution that matches world statistics because the bridge model plays the role of the environment for that domain. “LLMs cannot experience beauty” simply reasserts the thing you want to prove. We have no privileged readout for human qualia either. We infer it from behavior, physiology, and report. We do not understand human brains at the level of complete causal microphysics because of scale and complexity, not because there is a non-physical remainder. We likewise do not fully understand why a large model makes a given judgment. Same reason. Scale and complexity. If you point to mystery on one side as a defect, you must admit it on the other. The map versus territory line also misses the target. Of course a representation is not the thing itself. No one is claiming a jpeg is a sunset. The claim is that the structure necessary to act as if about sunsets can be encoded and learned. A system that takes in light fields, motor feedback, language, and reward and that updates an internal world model until its predictions and actions match ours to arbitrary precision will meet every operational test you have for meaning. If you reply that something is still missing, you have stepped outside evidence into stipulation. So let’s keep the ground rules clear. Everything we are and feel is physically instantiated. Physical instantiations at finite precision admit lossless encodings as strings. Strings can be learned over by generic function approximators that optimize on pattern consistency, regardless of whether the symbols came from pixels, pressure sensors, or phonemes. That makes the “text inside, image outside” complaint irrelevant. The substrate is a detail. The constraint is data and objective. We cannot yet build a full decoder for the human condition. That is a statement about engineering difficulty, not impossibility. And it cuts both ways. We do not know how to fully read a person either. But we do not conclude that people lack experience. We conclude that we lack understanding. |
Back of the envelope math puts an estimate of 10^42 bits to capture the information present in your current physical brain state. Thats just a single brain, a single state. Now you need to build your mythical decoder device, which can translate qualia from this physical state. Where does it live? What’s its output look like? Another 10^40 bitstring?
Again, these arguments are fun on paper. But they’re completely removed from reality.