ica.find_bads_exg components does not return the same components if using raw used for fitting or same raw with components excluded.

Hello,

I was trying to assess how the adaptative z-scoring of ica.find_bads_eog and ica.find_bads_ecg was performing, and I now face a behavior that I can not explain.

I start by loading one of my -raw.fif file and I apply filters/rereferencing. The file include 64 EEG channels + 1 EOG channel and 1 ECG channel.
I fit an ICA with this raw instance.
I look for the components correlating with the EOG and ECG channel (ica.find_bads_exg methods) and I exclude them.
I copy the raw instance and apply the ICA to it.
I look for the components correlating with the EOG and ECG channel of the copied instance.

I was expecting to find exactly the same, as the EOG and ECG channels are not modified by ica.apply. Am I wrong in assuming the methods ica.find_bads_eog and ica.find_bads_ecg are correlating with the EOG and ECG channel?

Reproducible example:

import mne
import numpy as np


# Load
fname = 'sample-raw.fif'
raw = mne.io.read_raw_fif(fname, preload=True)
mne.rename_channels(raw.info, {"AUX7": "EOG", "AUX8": "ECG"})
raw.set_channel_types(mapping={"ECG": "ecg", "EOG": "eog"})
mapping = {
    "FP1": "Fp1",
    "FPZ": "Fpz",
    "FP2": "Fp2",
    "FZ": "Fz",
    "CZ": "Cz",
    "PZ": "Pz",
    "POZ": "POz",
    "FCZ": "FCz",
    "OZ": "Oz",
    "FPz": "Fpz",
}
for key, value in mapping.items():
    try:
        mne.rename_channels(raw.info, {key: value})
    except Exception:
        pass

# Filter and montage
raw.filter(l_freq=1., h_freq=40., picks=['eog', 'ecg'], phase="zero-double",
           pad="edge")
raw.notch_filter(np.arange(50, 151, 50), picks=['eog', 'ecg'])  # Yes, not necessary
raw.add_reference_channels(ref_channels='CPz')
raw.set_montage('standard_1020')
raw.set_eeg_reference(
    ref_channels="average", ch_type="eeg", projection=True)
raw.filter(l_freq=1., h_freq=40., picks='eeg', phase="zero-double",
           pad="edge")

# ICA
ica = mne.preprocessing.ICA(method='picard', max_iter='auto', random_state=101)
ica.fit(raw, picks='eeg', reject_by_annotation=False)  # No annotations

# Correlation
eog_idx_raw, eog_scores_raw = ica.find_bads_eog(raw)
ecg_idx_raw, ecg_scores_raw = ica.find_bads_ecg(raw)
ica.exclude = eog_idx_raw + ecg_idx_raw

# Apply ICA
raw_ica_applied = raw.copy()
ica.apply(raw_ica_applied)

# Correlation with applied ICA
ica.exclude = list()
eog_idx_raw_ica_applied, eog_scores_raw_ica_applied = ica.find_bads_eog(raw_ica_applied)
ecg_idx_raw_ica_applied, ecg_scores_raw_ica_applied = ica.find_bads_ecg(raw_ica_applied)

File to run: (available 24h, ask for re-upload).
https://file.re/2021/10/13/sample-raw/

Running 0.23.0. Output:

eog_idx_raw
Out[15]: [0]

ecg_idx_raw
Out[16]: [26, 19]

eog_idx_raw_ica_applied
Out[17]: [0, 26, 19]

ecg_idx_raw_ica_applied
Out[18]: []

Mathieu

This is true and therefore, mne.preprocessing.find_ecg_events() and find_eog_events() (which are both used under the hood by ICA.find_bads_eXg()) should yield the exact same event arrays for both the pre- and post-ICA data.

However, for the actual pattern matching, the EEG channels will be used. I’m actually surprised you still get somewhat similar results after cleaning the data with ICA – I would have assumed that the artifacts would have been reduced so much that the pattern matching would fail. My bet would be that the “eog” components you find in eog_idx_raw_ica_applied are actually mostly physiological brain data you’d want to keep.

Ok, so this is actually very surprising to me. I assumed that to determine the EOG and ECG related components, the algo would look for the correlation between each component and the EOG/ECG channel. Why is it using the EEG channels for pattern matching/correlation?


As you can see above, I excluded simultaneously both EOG and ECG components with ica.exclude = eog_idx_raw + ecg_idx_raw. I then saved to disk both the ICA and the raw_ica_applied instances, but not the intermediate filtered/rereferenced raw.

From these 2 saved objects, how can I recover which component was excluded and with which correlation (score)?

ica.exclude will include both EOG and ECG related components, but it won’t tell me which is which.
Using ica.find_bads_eXg with the raw_ica_applied instance will not return the same output as when it is called with raw instance.
Any ideas?

I am looking a bit more into the documentation. for ica.find_bads_ecg:

Detect ECG related components.

Cross-trial phase statistics (default) or Pearson correlation can be
used for detection.

.. note:: If no ECG channel is available, routine attempts to create
          an artificial ECG based on cross-channel averaging.

To me, this clearly indicates that the method correlates with an ECG channel. This docstring doesn’t include the role of the EEG (data to be more general) channels.

I just had a look at the code and it appears what I stated above is, in fact, incorrect. I was mislead by some similarly incorrect bits in the docstring, which I’ll fix.

Another remark that doesn’t answer your question :slight_smile:

This is not “how it’s usually done”.

Instead, the “good” practice is to create EOG and ECG epochs and run find_bads_eXG() on those epochs.

Like I said, it doesn’t answer your question, but I just wanted to leave it here anyway.

There is definitely some confusion coming from both the docstring and from the tutorial:

https://mne.tools/stable/auto_tutorials/preprocessing/40_artifact_correction_ica.html#using-an-eog-channel-to-select-ica-components

Btw, the tutorial does use the raw instance and not epochs created around EOG or ECG artifacts.

And if what you stated above is incorrect, do you mean it does not use the EEG data for the correlation? In which case, why the difference?

For my particular need, I found part of what I was looking for: ica.labels_
It is documented, I didn’t pay enough attention. It gives me at least which components was excluded because it was related to EOG, and which was excluded because it was related to ECG. However, I am still missing the scores…

My “good practices” reference is the MNE-BIDS-Pipeline. Here’s the relevant (granted, somewhat lengthy) code we use there: