How to make good use of CUDA

mscheltienne · March 29, 2022, 3:07pm

Hello,

Our lab got new toys: 3 nice servers running each 4 Nvidia A30 GPU. So for the first time, and since the dataset for my ongoing study is growing larger and larger, I am trying to set CUDA and use as efficiently as possible those new resources.

I did enable CUDA with mne.utils.set_config('MNE_USE_CUDA', 'true'), it did create the JSON configuration file, and the test pytest mne/tests/test_filter.py -k cuda is passing. But the test does not seem to be way faster than CPU-based computation.

What is the best way to use the available resources?

I have a dataset with 1k short raw recordings (4 minutes each) sampled at 512 Hz of 1 kHz where I apply:

Resampling to 512 Hz if sampled at 1 kHz
Bandpass filters
Rereference
ICA decomposition
Bad interpolation
PSD with welch method

For now, my approach is to spawn e.g. 40 workers (process) and give them the files to process one by one. At least I process them 40 at a time.

Now, with CUDA, if I use it for the operation above that supports it (I guess only resampling and BP, is there a way to support any of the others, especially ICA decomposition?), does it really make a difference, considering:

the low sampling rate and the short duration?
the number of files to resample is very small, thus the step that could benefit from CUDA is BP filtering.

Moreover, I guess the CUDA session must be initialized for every new process (possibly at every job?), and it seems like this operation takes a significant amount of time.

And a final point, I guess for each new process spawn, I should also give it a different GPU to work on with mne-python/cuda.py at 091da8f01aeeecd7d583ba596cf5a85cd649f192 · mne-tools/mne-python · GitHub
There is no shortcut to distribute the load between different CUDA compatible GPUs, right?

I’m very new to CUDA, any tips on how to properly benefit from it would be appreciated

larsoner · March 31, 2022, 7:04pm

Whether or not CUDA speeds things up might be very system-dependent. I would first try the simplest use case (e.g., a single worker, comparing n_jobs=1 to n_jobs='cuda') to see if it helps in the first place. It used to make a big difference a couple of years ago when NumPy and SciPy used fftpack as their FFT backend, but now that they use pocketfft under the hood, there is probably much less benefit.

For what it’s worth, locally at some point I didn’t observe much if any benefit anymore, so I stopped bothering to use PyCUDA.

mscheltienne · April 1, 2022, 9:09am

I am correct that only filter and resampling steps could benefit from CUDA among the steps I listed?
Now my problem is that I am not familiar with CUDA, and I don’t know what to expect, or what is ‘normal’.

I ran this small function with fname as one of my EEG files (67 channels, 4 minutes @ 512 Hz) 100 times with different n_jobs:

def f(n_jobs):
    """Function to test cuda."""
    fname = r''
    raw = mne.io.read_raw_fif(fname, preload=True)
    raw.filter(
        l_freq=1.,
        h_freq=40.,
        picks=['eeg', 'eog', 'ecg'],
        method="fir",
        phase="zero-double",
        fir_window="hamming",
        fir_design="firwin",
        pad="edge",
        n_jobs=n_jobs
        )

Turns out…

----------
n_jobs = 1
Mean: 0.41
STD: 0.108
----------
n_jobs = 2
Mean: 0.78
STD: 0.362
----------
-- CUDA --
Mean: 3.25
STD: 29.26
----------

How weird is that?

The CUDA variant is very large because of the first call… taking more than 5 minutes to complete (341 seconds…). Next ones are way faster at 0.294 +/- 0.1386 (mean +/- STD).

cbrnr · April 1, 2022, 10:01am

I also have the impression that CUDA is not worth it for basic applications like FFT and ICA, because there are pretty efficient algorithms/libraries available for CPU. But I haven’t tried this in a while, so I’m interested in what you find. Unfortunately, since the GPUs are not in your local machine, you can’t even use them for gaming in case there are no convincing speed-ups for your code .

Topic		Replies	Views
Memory leaking when using GPU acceleration of filtering and resampling Support & Discussions	5	376	August 18, 2021
test_filter.py value mismatch error Mailing List Archive (read-only) list-archive	3	209	September 28, 2017
Recommended hardware for mne.chpi.compute_chpi_amplitudes? Support & Discussions preprocessing	2	95	January 22, 2024
EEG pre-processing ICA failure due to memory load Support & Discussions preprocessing , eeg , ica , epochs	10	920	August 20, 2021
Is it possible to run MNE/SkLearn on a GPU, and if so can you tell me where? Support & Discussions	6	341	March 10, 2023

How to make good use of CUDA

Related topics