Trying to evaluate the parallel efficiency of openTSNE (with FFT) as in https://opentsne.readthedocs.io/en/stable/benchmarks.html, but didn't see significant acceleration in the optimization step while increasing n_jobs.
Used Claude to dig a bit deeper, and here is what it found:
- if compiled with FFTW3, the code never calls fftw_init_threads()/fftw_plan_with_nthreads(). Without those calls, FFTW3 always runs single-threaded regardless of OMP_NUM_THREADS or n_jobs.
- if compiled with numpy fft
- OpenBLAS — FFT is single-threaded (OpenBLAS only parallelizes BLAS/LAPACK, not FFT)
- MKL (Intel) — FFT is multi-threaded via MKL_NUM_THREADS
- FFTPACK (NumPy's built-in fallback) — single-threaded
Just to confirm, is the scaling plot provided in https://opentsne.readthedocs.io/en/stable/benchmarks.html done with numpy fft + MKL, in order to achieve the demonstrated performance?
Trying to evaluate the parallel efficiency of openTSNE (with FFT) as in https://opentsne.readthedocs.io/en/stable/benchmarks.html, but didn't see significant acceleration in the optimization step while increasing n_jobs.
Used Claude to dig a bit deeper, and here is what it found:
Just to confirm, is the scaling plot provided in https://opentsne.readthedocs.io/en/stable/benchmarks.html done with numpy fft + MKL, in order to achieve the demonstrated performance?