|
|
|
|
|
by chaboud
2909 days ago
|
|
"...across geographic regions, such that one could treat a set of virtual machines as one ultra-wide-bus CPU with a 1 GHz clock speed." I'm not entirely sure what you're trying to say here, but I am entirely sure that it's wrong. A precise clock isn't the same thing as the removal of latency, and the operations of a CPU are ordered. That is, I can't start working on the multiplication of A * (B + C) until the addition result is available. Furthermore, if the elements of the operation, B and C, or parts of those elements, were separated by miles (or even feet), the latency of that operation would increase by orders of magnitude. I doubt that even a 1MHz distributed processor would be achievable as a large distributed bit field computer as you've laid out here. If you're worried about overhead in computing, it is critical to remember that a foot is a nanosecond. I'd much rather break my data down to register size (and I often do) than ship my data over a wire or fiber (which I also often do). |
|
The only hard part requiring serialized synchronization is the carry bit, across compute nodes. Share the carry bits between nodes, and while relaying a sentence to a cluster of synchronized nodes, the pipeline can shoot the sentence into the cluster as a unit, proxy and chain together the carry bits with a coordinated execution plan, and on the other side of the pipe, you get your well-timed 4096 bit result, all at 1 GHz, because the service is designed and produced to handle input at nanosecond intervals.
What are the advantages? Predictability, and expanded throughput.
Now you can look at an entire passage of text and make a determination about it in less time. Or stack many passages and composite them to assess or intuit variation. Designing the product this way makes it easy to reason about, and thus easier to market and sell. Is it possible to make a profitable system that works like this? Gee, great question! There's no obvious answer.
But anyway, from the perspective of a subscriber, it's on them to marshall their data, and then, if they have operations for which the scale of 4096 bit chunks improves results, they can get their granular operations done at 1 GHz, which allows them to predict time spent and overall cost more easily.
(e.g. I have all these [less-than-but-up-to] 4096 bit toots marshalled in a single data store, from a shit ton mastodon instances (i did all the crawling and retrieving, and saved them in one place, as a standardized data set), and I think this fact might be true about some of them, here is the rule set to interpret, please give me back the members of the toot array that return true when the function of this rule set returns true)
BTW, don't get hung up on 4096 as "the best number" I just chose it because it's a nice square number.