However, I find it difficult for Turing Pi to exploit the PCI-e connectivity.
It's very absurd to break it out for each of them modules to get an independent PCI-e slot. This could raise the layout of the board significantly.
Yet the speed they currently offer (PCI-e x1 Gen2) isn't fast enough for RDMA to effectively "chain" the compute modules.
So I would see that they would have some sacrifice (e.g. fallback to the smaller and more cliche mini-PCIe like most recent fresh IOT boards out there), or even remove PCI-e expansion but offer something else (e.g. embedded SATA host/multi-NICs where only the master node could control, then the rest of the children will have to rely on RDMA, despite it will be slow and painful).
It's not quite economical for Turing Pi to implement NUMA-like architecture so I would rule this out.
Turing Pi is mostly useful for educational purposes. For anything performance critical, a cheap 8-core x86 box will run circles around 7 Raspberry Pis in pretty much every way, no matter how they are networked.
As ARM is regressing in the Cloud market, as you can see with Graviton, we want to invest in the future. I don't think it will be particularly bad for 7 Pi 4s to match with a E5-2670 v2 in anyway. You can also listen to [this guy](https://youtu.be/HUamq0ey8_M?t=797) for a briefing.
From my perspective its 4*7 = 28 weak ARM cores vs 8 strong x86 cores. You can see that ARM actually had more cores, giving it an advantage plus to parallelized workload compared to x86.
Hell, maybe we can mix them in a bunch so that x86 runs powerful applications like GitLab, Prometheus and Postgres while ARM runs massively parallel workload that GPUs can't handle: Function as a Service (in AWS terms, Lambda; in CNCF's term, OpenFaaS), Linkerd handler (service mesh needs some kind of scheduling though), microservice replicas.
In the end CPU are all going to have a designated purposes, despite it should have had been "general purpose".
I was thinking you could use a PCIe Non-Transparent Bridge IC to connect all the PI's together. However those sort of chips aren't cheap, and it would only be a ~5x speed increase over the Gigabit ethernet.
It's very absurd to break it out for each of them modules to get an independent PCI-e slot. This could raise the layout of the board significantly.
Yet the speed they currently offer (PCI-e x1 Gen2) isn't fast enough for RDMA to effectively "chain" the compute modules.
So I would see that they would have some sacrifice (e.g. fallback to the smaller and more cliche mini-PCIe like most recent fresh IOT boards out there), or even remove PCI-e expansion but offer something else (e.g. embedded SATA host/multi-NICs where only the master node could control, then the rest of the children will have to rely on RDMA, despite it will be slow and painful).
It's not quite economical for Turing Pi to implement NUMA-like architecture so I would rule this out.