There are a lot of large-scale optimization problems in industry that are still compute bound. Currently available solvers are either single threaded on the CPU, or offer "parallelism" by running copies of the same problem on multiple threads, but with different initial conditions in hopes that one happens to converge faster.
I'll contend that this depends. For example, in mixed-integer linear programs, the function evaluations are trivial, but the combinatorial search is expensive. Alternatively, for nonlinear, continuous optimization problems with constraints, the primary cost is often the preconditioning or factorization of the systems associated either with an augmented or KKT systems. That said, for unconstrained or bound constrained, nonlinear, continuous optimization problems, I largely agree. And, even with general constraints, it's sometimes the case it's the function evaluations. Mostly, I wanted to contend that the linear system solves can often be the limiting factor and better large scale factorization codes are needed.