This is exactly the situation for a well-balanced parallel work queue. You want to start as many threads as there are cores and run them full tilt pulling work off the queue until it is empty. If you're running a large scale cluster that is dedicated to a particular task (e.g. like servicing a special kind of query, or encoding videos, rendering, etc), this is very common, or even a parallel Photoshop filter.
For this technique you generally dedicate some core(s) to those miscellaneous threads and flag the rest as unscheduable unless a thread is specifically assigned to them.
If you’re not sharing cores between VMs it’s typical to do the same at the hypervisor layer.
Yes, in practice you have to dedicate the whole machine for a specific application, but the one thread per isolated core is a proven one for high performance/low latency applications.