Using "n_jobs=-1" for faster parallel CPU processing does not increase speed

moritz-gerster · April 20, 2023, 4:32pm

Many mne functions have the keyword argument “n_jobs” to parallelize processing. My machine (M1 Mac) has 10 cores, but no matter how I change the “n_jobs” kwarg, the processing speed is always the same.

What am I doing wrong? Please check this Jupyter notebook for exemplary reproducible code.

Please note, that this does not seem to be an M1 issue. I ran the same notebook on an old Intel iMac and also did not observe any performance enhancements: Jupyter Notebook Intel Chip.

MNE version: 1.3.0
operating system: macOS 13.2.1-arm64-arm-64bit /

richard · April 20, 2023, 5:12pm

Hello @moritz-gerster,

generally, it’s important to consider that parallelization of data processing often comes with considerable initialization costs, i.e., chunks of data need to be copied to new memory segments and new processes need to be spawned; and after processing is done, the results need to be collected and glued back together. This means that for operations that don’t take long on a single core (i.e., finish in less than a few seconds or sometimes even tens of seconds), trying to parallelize things may even lead to a slowdown.

In the examples you posted, we’re talking processing time of about a quarter of a second, which I would say is not suitable for going parallel (at least not with the machinery we currently use in MNE-Python).

Personally, I use parallelization esp. when doing MVPA, and it leads to significant speedups on my M1 Mac.

Best wishes,
Richard

moritz-gerster · April 20, 2023, 6:19pm

Thank you very much for this instructive answer @richard!

I will try to re-run the script with much larger data.

Short follow-up question: What is MVPA?

richard · April 20, 2023, 6:28pm

“Decoding” / machine learning.

See for example this temporal generalization example:
https://mne.tools/stable/auto_tutorials/machine-learning/50_decoding.html#temporal-generalization

That’s an operation where we’d usually go parallel for a huge speedup. See the (admittedly not super easy-to-read) code in MNE-BIDS-Pipeline:

github.com

mne-tools/mne-bids-pipeline/blob/b74f056ab1b6e1ac87d1747a977bf2414d169e9f/mne_bids_pipeline/steps/sensor/_03_decoding_time_by_time.py#L150-L156


      
          if cfg.decoding_time_generalization:
              estimator = GeneralizingEstimator(
                  clf,
                  scoring=cfg.decoding_metric,
                  n_jobs=exec_params.n_jobs,
              )
              cv_scoring_n_jobs = 1

github.com

mne-tools/mne-bids-pipeline/blob/b74f056ab1b6e1ac87d1747a977bf2414d169e9f/mne_bids_pipeline/steps/sensor/_03_decoding_time_by_time.py#L165-L172


      
          scores = cross_val_multiscore(
              estimator,
              X=X,
              y=y,
              cv=cv,
              n_jobs=cv_scoring_n_jobs,
              verbose=verbose,  # ensure ProgressBar is shown (can be slow)
          )

Topic		Replies	Views
n_jobs on windows Mailing List Archive (read-only) list-archive	3	167	February 15, 2018
PSDEstimator using multiple cores even when n_jobs=1 Mailing List Archive (read-only) list-archive	1	234	July 3, 2018
n_jobs>1 not working Support & Discussions	2	524	November 23, 2021
Python - problem with PSD if n_jobs exceeds number of available CPUs? Mailing List Archive (read-only) list-archive	3	245	May 24, 2014
Question regarding computation time of Temporal Generalization Mailing List Archive (read-only) list-archive	2	223	October 2, 2020

Using "n_jobs=-1" for faster parallel CPU processing does not increase speed

Related topics