EEG pre-processing ICA failure due to memory load

Python version: 3.9.5
MNE version: 0.23.0
Windows 10, Anaconda, Jupyter Notebook
Intel i7-5600U CPU @ 2.60GHz, 8Gb RAM

Hi all,

I am trying to pre-process some EEG .bdf files with MNE, the data is 32 EEG channels (4 EOG channels) the trials are long at approximately 3800 seconds and include 1440 events. I have successfully filtered (0.05Hz / 30Hz BP), downsampled to 256Hz, re-referenced and epoched. However I want to perform ICA for artefact and blink detection using @richard 's Pybrain pipeline with specific ICA epoch array filtered at 2Hz - 30Hz bandpass and downsampled to 256Hz. But I am running into memory issues because of the number of time points I think and it crashes my process, even if decimating 5,10 or 20 times. I have tried Jupyter notebooks, VSCode and Google Colab with the same issues. A colleague has previously used EEGLAB to process so I am following the same pipeline to hopefully extract similar ERP’s but cannot get passed this point.

Do you have any thoughts/ideas please? I am truly stuck!

# bandpass filter 2Hz to 30Hz for improved ICA

raw_ica = raw.copy().filter(l_freq=2, h_freq=30)

Filtering raw data in 1 contiguous segment
Setting up band-pass filter from 2 - 30 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 2.00
- Lower transition bandwidth: 2.00 Hz (-6 dB cutoff frequency: 1.00 Hz)
- Upper passband edge: 30.00 Hz
- Upper transition bandwidth: 7.50 Hz (-6 dB cutoff frequency: 33.75 Hz)
- Filter length: 423 samples (1.652 sec)

# Epoch raw_ica

def CreateICAEpochs(raw_ica, tmin=-0.2, tmax=1200, baseline=(None, 0)): # Decimate by 10 to reduce memory load
    epochs = mne.Epochs(raw_ica,
                    events=events,
                    event_id=event_id,
                    tmin=tmin,
                    tmax=tmax,
                    baseline=baseline,
                    decim=10,
                    preload=False)                # No preload because the files are too big for memory
    return epochs

epochs_ica = CreateICAEpochs(raw_ica)
epochs_ica.info

Not setting metadata
Not setting metadata
1440 matching events found
Setting baseline interval to [-0.1953125, 0.0] sec
Applying baseline correction (mode: mean)
0 projection items activated
<ipython-input-21-93a84fcd77b1>:4: RuntimeWarning: The measurement information indicates a low-pass frequency of 30 Hz. The decim=10 parameter will result in a sampling frequency of 25.6 Hz, which can cause aliasing artifacts.

*|Measurement date|November 21, 2018 10:34:27 GMT|*
*| --- | --- |*
*|Experimenter|Unknown|*
*|Participant|Unknown|*
*|Digitized points|35 points|*
*|Good channels|0 magnetometer, 0 gradiometer, and 32 EEG channels|*
*|Bad channels||*
*|EOG channels|blink_1, blink_2, blink_3, blink_4|*
*|ECG channels|Not available|*
*|Sampling frequency|25.60 Hz|*
*|Highpass|2.00 Hz|*
*|Lowpass30.00 Hz|*
n_components = 32  # One for each channel, no bads

method = 'picard'

max_iter = 1000  # high iterations, can be changed

fit_params = dict(fastica_it=5) # runs 5 fast_ica before fitting picard

random_state = 36

ica = mne.preprocessing.ICA(n_components=n_components,
                            method=method,
                            max_iter=max_iter,
                            fit_params=fit_params,
                            random_state=random_state)

ica.fit(epochs_ica)

Fitting ICA to data using 32 channels (please be patient, this may take a while)
Loading data for 1440 events and 307252 original time points …
:12: RuntimeWarning: The epochs you passed to ICA.fit() were baseline-corrected. However, we suggest to fit ICA only on data that has been high-pass filtered, but NOT baseline-corrected.
501 bad epochs dropped

TypeError: __init__() missing 1 required positional argument: 'dtype'

**---------------------------------------------------------------------------** **MemoryError** Traceback (most recent call last) **MemoryError** : Unable to allocate 6.88 GiB for an array with shape (28851714, 32) and data type float64 The above exception was the direct cause of the following exception: **TypeError** Traceback (most recent call last) **<ipython-input-22-79f412e57723>** in <module> 10 fit_params **=** fit_params **,** 11 random_state=random_state) **---> 12** ica **.** fit **(** epochs_ica **)** **<decorator-gen-420>** in fit **(self, inst, picks, start, stop, decim, reject, flat, tstep, reject_by_annotation, verbose)** **~\anaconda3\envs\mne\lib\site-packages\mne\preprocessing\ica.py** in fit **(self, inst, picks, start, stop, decim, reject, flat, tstep, reject_by_annotation, verbose)** 572 **else** **:** 573 **assert** isinstance **(** inst **,** BaseEpochs **)** **--> 574** self **.** _fit_epochs **(** inst **,** picks **,** decim **,** verbose **)** 575 576 **# sort ICA components by explained variance** **~\anaconda3\envs\mne\lib\site-packages\mne\preprocessing\ica.py** in _fit_epochs **(self, epochs, picks, decim, verbose)** 634 **# more from _pre_whiten)** 635 data **=** np **.** hstack **(** data **)** **--> 636** self **.** _fit **(** data **,** **'epochs'** **)** 637 638 **return** self **~\anaconda3\envs\mne\lib\site-packages\mne\preprocessing\ica.py** in _fit **(self, data, fit_type)** 705 706 pca **=** _PCA **(** n_components **=** self **.** _max_pca_components **,** whiten **=** **True** **)** **--> 707** data **=** pca **.** fit_transform **(** data **.** T **)** 708 use_ev **=** pca **.** explained_variance_ratio_ 709 n_pca **=** self **.** n_pca_components **~\anaconda3\envs\mne\lib\site-packages\mne\utils\numerics.py** in fit_transform **(self, X, y)** 815 **def** fit_transform **(** self **,** X **,** y **=** **None** **)** **:** 816 X **=** X **.** copy **(** **)** **--> 817** U **,** S **,** _ **=** self **.** _fit **(** X **)** 818 U **=** U **[** **:** **,** **:** self **.** n_components_ **]** 819 **~\anaconda3\envs\mne\lib\site-packages\mne\utils\numerics.py** in _fit **(self, X)** 853 X **-=** self **.** mean_ 854 **--> 855** U **,** S **,** V **=** _safe_svd **(** X **,** full_matrices **=** **False** **)** 856 **# flip eigenvectors' sign to enforce deterministic output** 857 U **,** V **=** svd_flip **(** U **,** V **)** **~\anaconda3\envs\mne\lib\site-packages\mne\fixes.py** in _safe_svd **(A, **kwargs)** 67 **raise** ValueError **(** **'Cannot set overwrite_a=True with this function'** **)** 68 **try** **:** **---> 69 ****return****** linalg **.** svd **(** A **,** ****** kwargs **)** 70 **except** np **.** linalg **.** LinAlgError **as** exp **:** 71 **from** **.** utils **import** warn **~\anaconda3\envs\mne\lib\site-packages\scipy\linalg\decomp_svd.py** in svd **(a, full_matrices, compute_uv, overwrite_a, check_finite, lapack_driver)** 125 126 **# perform decomposition** **--> 127 **u, s, v, info = gesXd(a1, compute_uv=compute_uv, lwork=lwork,**** 128 full_matrices=full_matrices, overwrite_a=overwrite_a) 129 **TypeError** : __init__() missing 1 required positional argument: 'dtype'
1 Like

Hello,

I would say that a computer with this little memory is unsuitable for the EEG processing you have in mind; isn’t there any way you can upgrade the RAM to have at least 12, better 16 GB of memory?

Otherwise the only other proposals I have are:

  • decimate more (probably not really advisable though)
  • make the epochs shorter (probably not suitable for your analysis?)
  • only use a subset of epochs for fitting ICA (I guess this would be my preferred approach!), e.g.:
     ica.fit(epochs_ica[:100])  # only use the first 100 epochs for fitting
    
  • try using SSP instead of ICA for artifact rejection (might work well for EOG removal)

You’re also dropping a lot of “bad” epochs, is that intentional?

Thanks for the quick reply Richard.

Unfortunately no my ram is not upgradeable. I assumed it was ok, not great but ok.

I will look at the reduced ICA fit and SSP options, thanks.

I’m not sure why 510 epochs are being dropped. Is there a way to assess where in the processing that is taking place? Or why they are being dropped? It is not intentional.

epochs.drop_log tells you the reason why each epoch was considered for dropping. A summary is available via epochs.plot_drop_log()

You’d want to call

epochs_ica.drop_bad()

and then proceed as @drammock suggested:

epochs_ica.plot_drop_log()

mugwuffin.

Regarding your hardware spec.

You also can consider to run your analysis using Google Colab

Thanks, I did try Google Colab with this pipeline but again it crashes and failed on memory. As far as I understand the free tier of colab includes 13Gb and they have closed the hack to crash the runtime and be upgraded to 25Gb. I guess I could pony up the ÂŁ8.10 for colab pro.

Have you tried my suggestion to only use a subset of epochs for fitting ICA? Since your epochs are extremely long, even a handful of them should be sufficient to allow ICA to single out EOG and heartbeat artifacts. In fact, even less than a single epoch might be sufficient, as – if I’m understanding correctly – one epoch is 1200 seconds long, which is 20 minutes. Seriously, try feeding only a single epoch to ICA and see what happens – it might do the trick for you!

I couldn’t do it today unfortunately as was at work, but will absolutely try tomorrow and report back. But I think you have uncovered a mistake in your last comment @richard. My epochs are supposed to be -200ms to 1200ms! Not 1200s. :man_facepalming: oops. What an idiot. That might help :laughing:

I was trying to understand why my epoch.fif files are so big as well, 9 x 2Gb.

Back to the code.

:grin::see_no_evil::sunglasses::ok_hand:

Hehe, good luck!!

1 Like

I can report back that everything is back to normal, it was that silly time mistake. Thanks for the help and some good tips for me going forward. :wink: Great forum.

1 Like