Hello,
Our lab got new toys: 3 nice servers running each 4 Nvidia A30 GPU. So for the first time, and since the dataset for my ongoing study is growing larger and larger, I am trying to set CUDA and use as efficiently as possible those new resources.
I did enable CUDA with mne.utils.set_config('MNE_USE_CUDA', 'true')
, it did create the JSON configuration file, and the test pytest mne/tests/test_filter.py -k cuda
is passing. But the test does not seem to be way faster than CPU-based computation.
What is the best way to use the available resources?
I have a dataset with 1k short raw recordings (4 minutes each) sampled at 512 Hz of 1 kHz where I apply:
- Resampling to 512 Hz if sampled at 1 kHz
- Bandpass filters
- Rereference
- ICA decomposition
- Bad interpolation
- PSD with welch method
For now, my approach is to spawn e.g. 40 workers (process) and give them the files to process one by one. At least I process them 40 at a time.
Now, with CUDA, if I use it for the operation above that supports it (I guess only resampling and BP, is there a way to support any of the others, especially ICA decomposition?), does it really make a difference, considering:
- the low sampling rate and the short duration?
- the number of files to resample is very small, thus the step that could benefit from CUDA is BP filtering.
Moreover, I guess the CUDA session must be initialized for every new process (possibly at every job?), and it seems like this operation takes a significant amount of time.
And a final point, I guess for each new process spawn, I should also give it a different GPU to work on with mne-python/cuda.py at 091da8f01aeeecd7d583ba596cf5a85cd649f192 · mne-tools/mne-python · GitHub
There is no shortcut to distribute the load between different CUDA compatible GPUs, right?
I’m very new to CUDA, any tips on how to properly benefit from it would be appreciated