I am not sure where you get this information. Parallelization is everything in this space, hence why we have highly interconnected supercomputers to model the most difficult engineering problems. Typical runs use 30k+ cores for a single problem for weeks on end [0]. There are some special cases, such as Boltzmann/dvm [1] where invididual partitions of cells have millions of degrees of freedom, where memory bandwidth is the primary concerns. Even then, doing domain decomposition to a larger number of cores takes care of the issue.
A poorly-optimized code that does not scale is not evidence that all simulation tools behave the same. Most of the tools in use by NASA, DOD, DOE, research institutions and commercial codes scale very well. Weak and strong scaling both. In fact, scalability is one of the primary requirements for simulation tools that are used in mission critical environments. I've been the technical lead for many of these types of projects, and I have experience with most of the largest commercial, research, and open source simulation codes. The vast majority of parallel tools scale linearly well beyond the 10s of thousands of cores, and I do agree that at some point adding cores can cause bottleneck, but that point is usually far past the few hunder cores from your link for a significant portion of engineering applications.
It is common knowledge in the field to optimize a simulation for the largest number of cores it can efficiently use, so simulation cases are not just blindly thrown more cores without a justification given by the scalability. Your initial claim that parallelization doesn't get you much is flawed, parallelization is the only thing that enables scalable engineering analysis.
It's very, very hard to be memory bandwidth bottlenecked if your program is not parallelized, even on consumer hardware. A 5Ghz cpu core with dual-channel DDR5 might get 75 GB/s of memory bandwidth. That's 15 bytes per clock cycle. Prosumer hardware? Maybe 60 bytes per clock cycle. No way one cpu core can keep up with that.
Memory bandwidth can be increased by parallelization though. E.g. MPI (Message Passing Interface) is one of the major libraries in parallel programming and supercomputing and deals with parallelization across multiple machines.
[0] https://www.nas.nasa.gov/SC22/research/project12.html
[1] https://www.sciencedirect.com/science/article/abs/pii/S00219...