If you use numba (or cython, c extensions, etc) you can make them run without requiring that they hold the GIL, and they can run in parallel. Here's an example that should keep a CPU pegged at 100% utilization for a while:
import numba as nb
from concurrent.futures import ThreadPoolExecutor
from multiprocessing import cpu_count
@nb.jit(nogil=True)
def slow_calculation(x):
out = 0
for i in range(x):
out += i**0.01
return out
ex = ThreadPoolExecutor(max_workers=cpu_count())
futures = [ex.submit(slow_calculation, 100_000_000_000+i) for i in range(cpu_count())]
Even without requiring the GIL, these are still child threads of the main process, correct? And because of that, wouldn't the OS keep them all on the same core? And if that's the case, would ProcessPoolExecutor solve that problem?