Hacker News new | ask | show | jobs
by jabl 1719 days ago
> OpenBLAS is incompatible with application threads. Most Linux distributions provide a multi-threaded OpenBLAS that burns in a fire if you use it in multi-threaded applications.

Can you explain what you mean by this? Are you saying there's a correctness issue here? I only recall running into issues with MPI, where you (typically) run one MPI rank (process) per CPU core. Then if you combine that with a multi-threaded BLAS library you'll suddenly have N^2 BLAS threads fighting over the CPU's and performance goes down the drain. The solution to this is, like you say, to use a single-threaded OpenBLAS, or then the OpenMP OpenBLAS and set OMP_NUM_THREADS=1

I guess with threads you'll have the same issue if you launch N cpu-bound threads and all those call BLAS, resulting in the same N^2 issue as you see with MPI.

1 comments

Can you explain what you mean by this?

There is a nice description of this:

https://github.com/xianyi/OpenBLAS/issues/2543

At a previous employer, we have seen various issues, including crashes, non-determinisms, etc. Usually, these issues would go away when switching to MKL.

One of the more painful issues is hanging (lockup) at full CPU usage. At my workplace, initially we introduced a timeout to workaround the hang while trying to determine the cause of the hang. It happened within multithread R code. Various build flags for OpenBLAS have been tried to no avail. Setting OPENBLAS_NUM_THREADS=1 surely makes the problem go away, at the expense of performance.

That R code has since been ported to Python, but we faced the same issue again when using ThreadPoolExecutor, so we had to change it into ProcessPoolExecutor instead.