| HN Mirror

I realized after sleeping on it that I should clarify the core idea. With computer architecture, it's easy to get hung up thinking about things like I/O, busses, caching, etc. But I think the future is distributed content-addressable computing, where the state is just "out there" in the internet somewhere, and the runtime handles all the data locality issues so it looks like one continuous memory to the application. If we have that, then it's trivial to add processors either in the local mesh or even out on the internet somewhere.

Thinking in terms of cores connected to their 4 neighbors with a routing protocol like "do you have the data for hash ABC, otherwise ask your neighbors" or "store this data with hash XYZ" reduces the problem space to a key-value store similar to redis, so you don't have to worry about hardware caching.

For now, the network protocol could probably just be something like multicast. Packets would go to all neighbors, and they would be unconnected, unreliable like UDP. The header would either be a REQUEST with an N bit hash, or a RESPONSE with the N bit hash and the data associated with it. There might also be a TTL field that goes up to roughly sqrt(# of cores) so that packets diffuse through the network once but don't bounce around endlessly. I think that cores would be broadcasting new [hash][data] packets for every RAM write, and other cores would save all of them and evict their oldest hashes (which allows the cores to pre-cache data that's likely to be used soon). Then upon requesting a RESPONSE for a certain hash, a core would block until it arrived from the cluster in roughly sqrt(# of cores) cycles.

From a code standpoint, a page in the 32 bit address space would have a real pointer but it would map through the virtual memory manager (VMM) to one of these [hash][data] pages. So two processes in the cluster wouldn't have the same address for the same piece of data. This can be looked at as a feature, not a bug, because it enforces process separation similar to Erlang. So when a page's data is updated, the page gets swapped out by the VMM and re-hashed to a new content-addressable page out in the cluster. So it might make sense to have a coprocessor per-cpu that hashes pages for every RAM write. Or maybe the hashing could happen in the RAM write interrupt and we'd settle for the speed hit, not sure. Also it (might) be good to choose a hash that implements something where you can re-compute a partial range inside the hash rather than having to recompute the whole thing if a single byte changes. Like maybe a custom hasher could be designed that uses a divide and conquer strategy to find which half changed, and then the half in the half that changed, down to some level and append them like with the MD5 appending flaw where they make a document that looks like another document but has different data in the middle that both hash to the same hash so the documents have very different behavior. What I mean is, maybe there is a hashing algorithm that allows you to recalculate a half, or a quarter, or an eighth, down to some granularity, so the hashing could be optimized from O(n) to O(n/64) or whatever. Also after writing this out, I realized that the VMM would store (for SHA-1) a 20 byte hash for every 4k block, so roughly 1/200 of each core's RAM would go to this storage scheme.

Would this VMM be easier to build than an MMU? I don't know, but I do know that software is cheaper than hardware so it's maybe possible.