Check for ICA (toward Epochs Rejection)

  • MNE-Python version: 0.21.0
  • operating system: Windows10

Hi,
my script below aims in sequence toperform the following actions:
import raw data > filtering > run the ICA and reconstruct the signal > apply the CAR > create epochs from a single event > find the epochs to reject because of artifacts. In my example, each epoch is equal to each second of the recording. Regarding the ICA, I used the ICA version suggested by @larsoner that generates the same decomposition each time the script is run.

MY QUESTIONS:

  1. Considering that I’ve only frontal channels (if you import the mne raw object through my script you’ll see all info about channels and more), do you think that my approach for the ICA is correct? Please see the “# ICA SECTION” in the script.
  2. The final result of the script is represented by the percentage of epochs to rejects. In general I would expect that if I use the ICA (versus not using the ICA) the percentage of rejected epochs would be reduced, but it is not always like that. I did several tests running the analysis with all the following combinations, where for each combination I report the percentage of epochs rejected. From my tests, the better combination in order to have less epoch rejected is the “D”, where I run the ICA with the reject parameter enabled in the ica.fit, along with the CAR. From your experience, do you think that such results are reasonable? Do you have some tips to improve my pipeline? In general, whatever suggestion will be really appreciated.

For the following 4 tests, if ICA was run, the “reject” ICA parameter WAS used:

  • A) without ICA, without CAR = 80.1% epochs rejected
  • B) ICA, without CAR = 40.3 % epochs rejected
  • C) without ICA, CAR = 80.9% epochs rejected
  • D) ICA, CAR = 2.2% epochs rejected

For the following 2 tests ICA was run excluding the “reject” ICA parameter from ica.fit :

  • E) ICA, without CAR = 40.3 % epochs rejected
  • F) ICA, CAR = 61% epochs rejected

N.B tests “B” and “E” led to same results, while “D” and “F” led to different results.

DATA AND SCRIPT

  • Here mne raw object to download (the raw object of one subject, 4 MB)
  • The entire script is shown below (the only change required is the path to the folder where you donwloaded the file)
import os
import mne

# LOAD THE EEG RECORDING
sample_data_folder = r"F:\folderContaningData" # change the path to the folder where you downloaded the file
sample_data_raw_file = os.path.join(sample_data_folder, 'exp_c3r1c1m4a1o_conc.fif') 
raw = mne.io.read_raw_fif(sample_data_raw_file, preload=True)

# FILTERING
lowPass = 30 # filtering
highPass = 1 # filtering
raw_filtered = raw.copy().filter(l_freq=highPass, h_freq=lowPass)

# --------------------- START ICA SECTION
from mne.preprocessing import (ICA)

n_components = None  # if float, select n_components by explained variance of PCA
method = 'fastica'  # for comparison with EEGLAB try "extended-infomax" here
decim = 3  # we need sufficient statistics, not all time points -> saves time

# we will also set state of the random number generator - ICA is a
# non-deterministic algorithm, but we want to have the same decomposition
# and the same order of components each time this tutorial is run
random_state = 23
ica = ICA(n_components=n_components, method=method, random_state=random_state)
reject = dict(mag=5e-12, grad=4000e-13) # ???
ica.fit(raw_filtered, decim=decim, reject=reject) 
# ica.fit(raw_filtered, decim=decim)  # enable this line and disable the above line, if you want to run ICA without reject parameter
print(ica)

# plot components
raw_filtered.plot(title='EEG before ICA')
ica.plot_components()
ica.plot_sources(raw_filtered)

# select and exclude component, reconstruct and plot
ICAcomponents2Exlude = [0]
ica.plot_overlay(raw_filtered, exclude=ICAcomponents2Exlude, picks='eeg')
ica.exclude = ICAcomponents2Exlude
reconst_raw = raw_filtered.copy()
ica.apply(reconst_raw) # Apply ICA for remove the components
reconst_raw.plot(title='EEG after ICA')
raw_filtered.plot(title='EEG before ICA')
# ---------------------  END ICA SECTION

# Enable the following line in case you don't run the ICA section
# reconst_raw = raw_filtered.copy()


# ---------------------  START CAR (common average reference) SECTION
reconst_raw.set_eeg_reference('average', projection=True)
reconst_raw.apply_proj()
reconst_raw.plot(title='EEG after CAR')
print(reconst_raw.info)
# ---------------------  END CAR SECTION

# Enable the following line in case you don't run BOTH the ICA section and the CAR section
# reconst_raw = raw_filtered.copy()

# --------------------- START EVENT-EPOCHS (ONLY ONE EVENT, see 'My Event': 1)
# each second of the recording will be equal to one epoch

import numpy as np
from autoreject import get_rejection_threshold  # noqa
sfreq = 250 # eeg data sample frequency
event_id = 1  # This is used to identify the events.
events_raw = []
total_sec = len(reconst_raw) / sfreq
for i in range(0, int(total_sec)):
    event_eachSecond = [i*sfreq, 0, event_id]
    events_raw.append(event_eachSecond)
events = np.array(events_raw)
mne.viz.plot_events(events, sfreq=reconst_raw.info['sfreq']) # plot events

# from events to epochs
event_id = {'My Event': 1}
epochs = mne.Epochs(reconst_raw, events, event_id, tmin=0, tmax=1,baseline=(0, 0), preload=True)
reject = get_rejection_threshold(epochs, decim=2)
print('The rejection dictionary is %s' % reject)
epochs.drop_bad(reject=reject)
epochs.average().plot()
mne.viz.plot_epochs(epochs)

# FINAL RESULTS (PERCENTAGE OF REJECTED EPOCHS)
epochs.plot_drop_log() # A summary of the causes of rejected epochs can be generated with the plot_drop_log() method. If it gives the name of channel such as ‘F8’ it means the epoch was rejected because ‘F8’ exceeded the peak-to-peak rejection limit.


:point_right: :point_right: :point_right: Thank you in advance :point_left: :point_left: :point_left:

1 Like

Hello, this is quite a bit of information you’re sharing here, which might be why you haven’t received a response yet. :slight_smile:

Generally I can say that it seems you’re running ICA on your continuous data, even though you’re epoching later on. This is not how it’s “usually” done. If you plan to use epochs, it’s a good idea to epoch the data first and run ICA on the epoched data. This will ensure ICA doesn’t get to see “irrelevant” portions of the data (and it will speed up computation significantly; you may then e.g. drop the decimate parameter too). On another side note, I suggest you try out the picard algorithm, which usually converges faster than fastica and infomax.

I hope this helped a little bit.

Hi @richard,
thank you, I’m agree with you maybe too much stuff :wink:
Regarding the ICA, I tried to replace the “fastica” with “picard” in my code but I received the error:
ModuleNotFoundError: No module named 'picard'

Thank you for the tip to run the ICA after epoching data (as “usually done”).
In general I applied ICA + CAR + Artifact rejection for epochs, hoping to get a signal cleaned by artifacts (at the end I get back the “seconds with no artifacts”).

Considering that I have only frontal channels, can I do something more to clean up my data or do you think is fine like that? By the ICA I try to reconstruct the signal in occurrence of eye blinks, while with the criteria added in the epoch section I just can obtain back the “seconds with no artifacts”.

Thank you,

If you’re using conda (as suggested in the MNE installation instructions), you can install picard via

conda install -c conda-forge python-picard

Alternatively, you can install it via pip:

pip install python-picard

To remove eye blinks, you’ll want to follow the following approach:

  • epoch your data and run ica.fit() on your epochs to get the components
  • go back to your raw data, identify eye blinks and create epochs around the blink events. MNE-Python can do this automatically for you using mne.preprocessing.create_eog_epochs()
  • using the ICA fit you created in the first step, run ica.find_bads_eog() on the EOG epochs. This will return a list of ICs that appear to be related to ocular activity
  • You can review the effects of excluding those components by adding them to ica.exclude and running ica.plot_overlay() on the averaged EOG epochs, i.e., ica.plot_overlay(eog_epochs.average())
  • Run ica.apply() on your epochs to remove ocular artifacts.

If you’re brave & curious, you can have a look at how we’re handling it in the MNE Study Template; but be warned, the code is slightly more complex than what I’ve explained above.

Hi @richard thank you for the tips.

  • Picard module now installed.I did some tests replacing the “fastica” method with the “picard” one.
  • I didn’t know about the existence of “mne templates”, that’s great. I will review them and try to understand how they works.

In general, please let me know if I’m wrong, I understood from your all your answers that :

  • my pipeline at the moment is not wrong or leading to uncorrect results. Nevertheless it can be improved in order to speed up the process (e.g. running the ICA on epoched data will reduce the time compared to the current situation).

What looks really weird to me from the beginning is why I get sometime a so huge amount of dropped epochs. From my short experience, generally the epochs containing artefacts are not higher than the 30% of the total epochs (even if in a ideal word I’d wish to stay below the 10%).

As you can see from my results listed below, and integrated with the results of the new '‘picard’ as well, most of the time the epochs rejected are higher than the 30% (even 80%). What do you think about that?

Furthermore, you can see how changing the sequence of methods like ICA and CAR, and changing inner parameters like “reject” or “fastica/picard”, led to tremendous different results in term of the percentage of rejected epochs. Looking at my results you can see that the smaller amount of rejected epochs is obtained through option C using the “picard” method (1.5% epochs rejected), but unfortunately this is not true for every subject I analyze (for another subject the best option is “D” ). What do you think about that?

thank you,

For the following 4 tests, if ICA was run, the “reject” ICA parameter WAS used:

A) without ICA, without CAR = 80.1% epochs rejected
B) ICA, without CAR = 40.3 % epochs rejected —> with picard = 1.5% epochs rejected
C) without ICA, CAR = 80.9% epochs rejected
D) ICA, CAR = 2.2% epochs rejected —> with picard = 61% epochs rejected
For the following 2 tests ICA was run excluding the “reject” ICA parameter from ica.fit :

For the following 2 tests ICA was run excluding the “reject” ICA parameter from ica.fit :
E) ICA, without CAR = 40.3 % epochs rejected —> with picard = 40.3% epochs rejected
F) ICA, CAR = 61% epochs rejected —> with picard = 5.7% epochs rejected

Hello,

did you ensure ICA always converged? I usually set ICA(..., max_iter=10000) just to ensure it won’t abort prematurely.

Then, of course different ICA algorithms may yield different solutions, so it may not always be the same components you should remove. You should manually inspect the ICA results and select which components to exclude.

Lastly, setting a reference before or after ICA, or setting a different reference before creating epochs shouldn’t make a difference (definitely will not make a difference for the latter; might make a slight difference for the former, esp. if it means that the rank of your data changes)

I’m still not really certain about your procedure, to be honest. Are you following my suggestions from above?

Thank you, I confirm that also integrating the parameter max_iter=10000 I get the same results as you can see below:

For the following 4 tests, if ICA was run, the “reject” ICA parameter WAS used:

A) without ICA, without CAR = 80.1% epochs rejected
B) ICA, without CAR = 40.3 % epochs rejected —> with picard = 1.5% epochs rejected —> with picard + max_iter(10000) = 1.5% epochs rejected
C) without ICA, CAR = 80.9% epochs rejected
D) ICA, CAR = 2.2% epochs rejected —> with picard = 61% epochs rejected —> with picard + max_iter(10000) = 61.0% epochs rejected

For the following 2 tests ICA was run excluding the “reject” ICA parameter from ica.fit :

For the following 2 tests ICA was run excluding the “reject” ICA parameter from ica.fit :
E) ICA, without CAR = 40.3 % epochs rejected —> with picard = 40.3% epochs rejected —> with picard + max_iter(10000) = 40.3% epochs rejected
F) ICA, CAR = 61% epochs rejected —> with picard = 5.7% epochs rejected —> with picard + max_iter(10000) = 5.7% epochs rejected

Yes, I confirm that I always checked the component to remove, that by the way it was always the same (the component “0”) each time I run the ICA.

My results show that if I run versus not run the CAR section, I get different results in terms of percentage of rejected epochs. In my first post I shared my data, as well as the entire script if you’d like to have a look. I don’t understand why this happen


I’m still working in restructuring the entire pipeline according to your tip (and the template you shared that I found really interesting but I takes a little more time to me to understand it), in the meanwhile I got back with those aspects I was able to quickly test and that look strange at the moment as described above.

I’m just wondering, but let me know if you are agree, maybe we could try to understand in this post why the weird results were provided in terms of the percentage of rejected epochs, while in another post I could describe the main objectives of my pipeline in order to let you understand what I would like to do and I would be really thankful if you could then share some ideas in order to achieve those objectives. This is just an idea to not overload of content this post as you said at the beginning, but please let me know what you think about that and I will follow your guidelines, thank you

I still don’t understand why you run and apply ICA on your continuous data if you’re going to create Epochs later anyway; this seems very inefficient to me both in terms of computing resources / time and your goal to remove artifacts affecting your signals of interest (i.e., the epochs)

I’m also not familiar with the way you set the rejection thresholds (autoreject.get_rejection_threshold()). Why not try something less fancy with explicit, manually specified values?

I have to admit I’m a bit overwhelmed by all the data and stats you provide, I don’t have enough time to really try and understand what you’re doing in detail, sorry. You need to be more concise so my brain can follow :slight_smile:

Hi @richard,
thank you, in the meanwhile I tried to restructure the pipeline according to your tips (that I included at the beginning of each operation in the code below):

Here the code:

# FILTERING
lowPass = 30 # filtering
highPass = 1 # filtering
raw_filtered = raw.copy().filter(l_freq=highPass, h_freq=lowPass)


#%% 
"""
epoch your data and run ica.fit() on your epochs to get the components
"""
 
import numpy as np
sfreq = 250 # eeg data sample frequency
event_id = 1  # This is used to identify the events.
events_raw = []
total_sec = len(raw_filtered) / sfreq
for i in range(0, int(total_sec)):
    event_eachSecond = [i*sfreq, 0, event_id]
    events_raw.append(event_eachSecond)
events = np.array(events_raw)
mne.viz.plot_events(events, sfreq=raw_filtered.info['sfreq']) # plot events

# from events to epochs
event_id = {'My Event': 1}
epochs = mne.Epochs(raw_filtered, events, event_id, tmin=0, tmax=1,baseline=(0, 0), preload=True)
epochs.average().plot()
mne.viz.plot_epochs(epochs)

# RUN ICA FIT ON EPOCHS
from mne.preprocessing import (ICA)
n_components = None
method = 'picard'
random_state = 23
ica = ICA(n_components=n_components, method=method, random_state=random_state, max_iter=10000)
decim = 3  # we need sufficient statistics, not all time points -> saves time
reject = dict(mag=5e-12, grad=4000e-13)
ica.fit(epochs, decim=decim, reject=reject) # ICA FIT ON EPOCHS + REJECT

#%%
"""
go back to your raw data, identify eye blinks and create epochs around the blink events. 
MNE-Python can do this automatically for you using mne.preprocessing.create_eog_epochs()
"""
eog_epochs = mne.preprocessing.create_eog_epochs(raw_filtered, baseline=(0, 0))

#%%
"""
using the ICA fit you created in the first step, run ica.find_bads_eog() on the EOG epochs. 
This will return a list of ICs that appear to be related to ocular activity
"""
eog_indices, eog_scores = ica.find_bads_eog(eog_epochs)

#%%
"""
You can review the effects of excluding those components by adding them to ica.exclude 
and running ica.plot_overlay() on the averaged EOG epochs, i.e.,
"""
ica.exclude = eog_indices
ica.plot_overlay(eog_epochs.average())


#%%
"""
Run ica.apply() on your epochs to remove ocular artifacts.
"""
ica.apply(eog_epochs)


#%%
"""  
# Reject epochs using autoreject
"""

from autoreject import get_rejection_threshold  # noqa
reject = get_rejection_threshold(epochs, decim=2)
print('The rejection dictionary is %s' % reject)
epochs.drop_bad(reject=reject)
epochs.average().plot()
mne.viz.plot_epochs(epochs)

# FINAL RESULTS (PERCENTAGE OF REJECTED EPOCHS)
epochs.plot_drop_log() # A summary of the causes of rejected epochs can be generated with the plot_drop_log() method. If it gives the name of channel such as ‘F8’ it means the epoch was rejected because ‘F8’ exceeded the peak-to-peak rejection limit.

Running the line:
eog_epochs = mne.preprocessing.create_eog_epochs(raw_filtered, baseline=(0, 0))

I get the following error:

`No EOG channels found
Trying with EEG 061 and EEG 062

RuntimeError: EEG 61 or EEG 62 channel not found !!`

I’ve no an EOG channel, my channels are “F7”, “AF7”, “Fp1”, “Fpz”, “Fp2”, “AF8”, “F8”.
I tried to run just for testing the same line including a specific channel (“Fp1”):
eog_epochs = mne.preprocessing.create_eog_epochs(raw_filtered, ch_name="Fp1", baseline=(0, 0))
and it works fine
 but “Fp1” is not an EOG channel and however the following line:
eog_indices, eog_scores = ica.find_bads_eog(eog_epochs)
triggers again the same error of before:
RuntimeError: EEG 61 or EEG 62 channel not found !!

What should I do in this case? Am I getting closer to what you suggested me?

Yes I can try also another method that allows to get the percentage of epochs with artefacts that should be removed. I know only the autoreject approach at the moment, but I will explore also manual solution you suggested.

thank you

You should get it from the raw object instead:

sfreq = raw_filtered.info['sfreq']

This should ideally be a dictionary mapping event names to event IDs.

Use raw_filtered.tmax instead.

I have no idea why you are doing this? Don’t you have proper events in your data?

Why are you using a one-sample baseline?

I don’t think you need decim anymore if you’re working on epochs.

The reject parameter only has an effect when you pass raw data. For your epochs, it won’t do anything.
Also it doesn’t make sense anyway: you said you have EEG data, but your reject dict only contains information for MEG sensors.

This only removes the artifacts from eog_epochs. But you’ll want to remove them from epochs, so pass epochs here instead.

I would skip all of that for the time being, as it just adds complexity

Exactly because you don’t have a dedicated EOG channel, you must pass ch_name to both create_eog_epochs() and ica.find_bads_eog(): how else would MNE know which channels it should use to detect EOG artifacts?

You are right, but if I can only use one channel, which channel should I use (I’ve “F7”, “AF7”, “Fp1”, “Fpz”, “Fp2”, “AF8”, “F8” that are all more or less affected by blinks)? At the moment I added “Fpz” in the code below.

The code run smoothly now, but at end, the blinks don’t get removed, they are all still there. What’s wrong in my code below? Thanks

# FILTERING
lowPass = 30 # filtering
highPass = 1 # filtering
raw_filtered = raw.copy().filter(l_freq=highPass, h_freq=lowPass)

# ch_names = ["F7", "AF7", "Fp1", "Fpz", "Fp2", "AF8", "F8"]
#%%
# CREATE EPOCHS
"""
epoch your data and run ica.fit() on your epochs to get the components
"""
# EPOCH DATA
import numpy as np

sfreq = 250 # eeg data sample frequency
event_id = 1  # This is used to identify the events.
events_raw = []
total_sec = len(raw_filtered) / sfreq
for i in range(0, int(total_sec)):
    event_eachSecond = [i*sfreq, 0, event_id]
    events_raw.append(event_eachSecond)
events = np.array(events_raw)
mne.viz.plot_events(events, sfreq=raw_filtered.info['sfreq']) # plot events

# from events to epochs
event_id = {'My Event': 1}
epochs = mne.Epochs(raw_filtered, events, event_id, tmin=0, tmax=1,baseline=(0, 0), preload=True)
epochs.average().plot()
mne.viz.plot_epochs(epochs)


# RUN ICA FIT ON EPOCHS
from mne.preprocessing import (ICA)
n_components = None
method = 'picard'
random_state = 23
ica = ICA(n_components=n_components, method=method, random_state=random_state, max_iter=10000)
decim = 3  # we need sufficient statistics, not all time points -> saves time
reject = dict(mag=5e-12, grad=4000e-13)
ica.fit(epochs, decim=decim, reject=reject) # ICA FIT ON EPOCHS + REJECT 

#%%

#
"""
go back to your raw data, identify eye blinks and create epochs around the blink events. 
MNE-Python can do this automatically for you using mne.preprocessing.create_eog_epochs()
"""
# eog_epochs = mne.preprocessing.create_eog_epochs(raw_filtered, baseline=(0, 0))
# eog_epochs = mne.preprocessing.create_eog_epochs(raw_filtered, ch_name=ch_names, baseline=(0, 0))
eog_epochs = mne.preprocessing.create_eog_epochs(raw_filtered, ch_name="Fpz", baseline=(0, 0))

#%%
"""
using the ICA fit you created in the first step, run ica.find_bads_eog() on the EOG epochs. 
This will return a list of ICs that appear to be related to ocular activity
"""
eog_indices, eog_scores = ica.find_bads_eog(eog_epochs, ch_name="Fpz")

#%%
"""
You can review the effects of excluding those components by adding them to ica.exclude 
and running ica.plot_overlay() on the averaged EOG epochs, i.e.,
"""
ica.exclude = eog_indices
ica.plot_overlay(eog_epochs.average())

 

"""
Run ica.apply() on your epochs to remove ocular artifacts.
"""
ica.apply(eog_epochs)

#%%
# other way to visualize epochs
epochs.average().plot()
mne.viz.plot_epochs(epochs)

I would definitely pick Fp1 or Fp2, I think they usually represent the VEOG quite well.

You’re removing the blinks from your EOG epochs here. This is a nice sanity check alright, but it won’t affect your “actual” (relevant) epochs. So you need to call:

ica.apply(epochs)

instead.

thank you, I updated Fp1 as EOG channel and called ica.apply(epochs).
Finally if I run:
mne.viz.plot_epochs(epochs)
I would expect to see that blinks get removed, but the image below looks to tell a different story (all blinks still there), what do you think?

Yes, the blinks are still there. Did you check that

ica.plot_overlay(eog_epochs.average())

actually showed an effect of your ICA cleaning? It should look something like this:

https://mne.tools/mne-study-template/examples/ds000248_ica/sub-01_task-audiovisual_proc-ica_report.html#6

Running
ica.plot_overlay(eog_epochs.average())
It looks that there were no effects produced by the ICA cleaning, below the image.

Yep, there is no effect at all. Check that you have set ica.exclude

Yes, it was set (ica.exclude = eog_indices). Nevertheless, I’ve just noted that “eog_indices” list is empty, while it should contain the ICs associated to the ocular activity.
This is how I calculated the “eog_indices”:
eog_indices, eog_scores = ica.find_bads_eog(eog_epochs, ch_name="Fp1")
Is it correct?

Below the full code related to the ICA section:

# RUN ICA FIT ON EPOCHS
from mne.preprocessing import (ICA)
n_components = None
method = 'picard'
random_state = 23
ica = ICA(n_components=n_components, method=method, random_state=random_state, max_iter=10000)
decim = 3  # we need sufficient statistics, not all time points -> saves time
reject = dict(mag=5e-12, grad=4000e-13)
ica.fit(epochs, decim=decim, reject=reject) # ICA FIT ON EPOCHS + REJECT

eog_epochs = mne.preprocessing.create_eog_epochs(raw_filtered, ch_name="Fp1", baseline=(0, 0))
eog_indices, eog_scores = ica.find_bads_eog(eog_epochs, ch_name="Fp1")
ica.exclude = eog_indices
ica.plot_overlay(eog_epochs.average())
ica.apply(epochs)

# Check
epochs.average().plot()
mne.viz.plot_epochs(epochs)

Which indicates that MNE didn’t manage to extract an IC related to ocular activity.

First you need to ensure that ICA decomposition worked correctly. You need to manually inspect all of your ICs and see if they “make sense”. How many components were produced? How much variance do they explain? Inspect the log output carefully to spot any potential issues.

If all is well, you can then try to play with the threshold parameter of ica.find_bads_eog(), be sure to check out the documentation to get an idea of what it’s doing.

Also you should drop this line – you only have EEG data, so specifying rejection thresholds on MEG channels doesn’t make much sense.

And decim shouldn’t be necessary either

Lastly, a one-sample baseline is probably also not really what you want?