Artifact Removal from EEG Data using ICA

  • MNE version: 1.8.0
  • operating system: Windows 10

Hi, I am currently trying to preprocess my EEG Data from the CHB-MIT Dataset (CHB-MIT Scalp EEG Database v1.0.0). After separating it into epochs, I applied bandpass filtering to my raw data, and then I applied ICA for artifact removal such as eye blinking. Code worked without any error, but when I plotted the new filtered data, I noticed that information from 6 channels: FZ-CZ, CZ-PZ, P7-T7, T7-FT9, FT9-FT10, FT10-T8. The code I have applied is provided at the end of the text. Also, the scaling of the data has decreased so much. What might be the cause of these and could this affect the quality of the data? How can I solve this?

import mne
import os
from mne.preprocessing import ICA 
import matplotlib.pyplot as plt

data_directory = "/any/data/path/software/dataset"

for i in range(1, 47):
    filtered_path  = os.path.join(data_directory, f"chb01_{i:02}_filtered-epo.fif")
    if os.path.exists(filtered_path):
        print(f"Applying ICA on {filtered_path}...")

        epochs = mne.read_epochs(filtered_path, preload=True)
        '''
        print("Original Data:")
        epochs.plot(n_channels=10, block=True)
        '''
        print("Available channel names:", epochs.info['ch_names'])

        # Avoid setting a montage if you don't have electrode locations
        ica = mne.preprocessing.ICA(n_components=23, random_state=97, max_iter=800)
        ica.fit(epochs)
        epochs_clean = ica.apply(epochs.copy())

        clean_path = os.path.join(data_directory, f"chb01_{i:02d}_clean-epo.fif")
        epochs_clean.save(clean_path, overwrite=True)

        print("Cleaned data:")
        epochs_clean.plot(n_channels=10, block=True)

    else:
        print(f"{filtered_path} does not exist :(")

print("Artifact removal is completed.")

Hello @wosimidwa ans welcome to the forum!

It’s not clear to me which problems you’re actuallly encountering. Please provide some output and figures to help us understand.

Best wishes,
Richard

Hi @richard, thank you for your welcome!

Here is the output file of my dataset, with code provided to visualize it. It is the output after applying band-pass filtering and ICA for artifact removal. My main issue is normally I have recordings for the 6 channels I have mentioned but after applying ICA with the code provided before, recordings of those channels are gone and I cannot figure out why. I am trying to preprocess raw EEG data. Thank you for your interest.

https://drive.google.com/drive/folders/1FSdD4Hi7Ww74zuGiEN0OhQtstaBqM8DP?usp=drive_link

# %%

import mne

import os

import matplotlib.pyplot as plt

data_path = "/any/data/path/software/dataset/chb01_03_clean-epo.fif"

raw = mne.read_epochs(data_path)

raw.plot(n_channels=10, block=True)

Thanks @wosimidwa. But could you please share a screenshot of the figure right here on the forum? That way it’s easier for everyone to see what’s going on, without having to download a file.

FWIW, channels shouldn’t just disappear when applying ICA. Are you sure those are there in the epochs object right before you fit and apply ICA?

Best wishes,
Richard

Yes I am sure they were.


Here is the problematic channels when data is plotted

What does it look like before ICA?
Have you tried ica = mne.preprocessing.ICA(n_components=None, …)?


This is the plotting of the file before the application of ICA, scale is different and as you can see all channels present data.
I couldn’t understand the function of the code line you have mentioned. Can you please explain it briefly? thank you

This might be unrelated, but the code you posted does not zero out any ICs, so it does not remove any artifacts. Is this really the code you are using? Or do you manually perform ICA and then visually select specific components to remove?

Also, and this might be unrelated as well, but I recommend applying a bandpass filter on the continuous data and epoch afterwards to avoid edge artifacts.

1 Like

Well, in your example, you do:

I’m suggesting to pass n_components=None here instead. MNE-Python will automatically pick the correct number of components.

Best wishes,
Richard

Well, yes I have tried to write this code using the documentation and thought it was removing the artifacts :no_mouth: It is my first time trying to preprocess data.
Can you please give me more information about removing artifacts from my data? And lastly, I will follow your advice. Thank you

You need to select components for exclusion before running ica.apply(). @cbrnr is right in that something is fishy here as the figures you get before and after ica.apply() without even removing a component look different. They should look identical. This implies that there’s a bug somewhere in your code, or the 23 components you create are not sufficient to capture the variability of the data (Edit: I’m actually not sure if this hypothesis actually makes sense, since MNE uses all PCA components to reconstruct the data by default). Try passing n_components=None. Aside from this and the fact that the snippet you shared doesn’t remove any components, I don’t see anything wrong with your code.

1 Like

Thank you, @richard. I guess I will just start over the whole process because I noticed I couldn’t do the events part properly. I guess I will use annotations. My annotations file is seizure-based.
Later on, I will try SSP for artefact removal because I couldn’t do ICA (tried your suggestion but didn’t work out).
Thank you so much for everything.

Why did ICA not work for you? Did you read Repairing artifacts with ICA — MNE 1.8.0 documentation? I’ve also written a shorter blog post on how to remove ocular activity with ICA, maybe this is useful for you.

3 Likes

Thank you @cbrnr. I have checked the documentation but still had some difficulties. I will check your blog, thank you.

I was following your documentation and I have noticed my main problem. My problem is with the channels and montage type. My dataset is in the standard-1020 montage but I am still getting following error:

{
	"name": "ValueError",
	"message": "DigMontage is only a subset of info. There are 23 channel positions not present in the DigMontage. The channels missing from the montage are:

['FP1-F7', 'F7-T7', 'T7-P7', 'P7-O1', 'FP1-F3', 'F3-C3', 'C3-P3', 'P3-O1', 'FP2-F4', 'F4-C4', 'C4-P4', 'P4-O2', 'FP2-F8', 'F8-T8', 'T8-P8-0', 'P8-O2', 'FZ-CZ', 'CZ-PZ', 'P7-T7', 'T7-FT9', 'FT9-FT10', 'FT10-T8', 'T8-P8-1'].

Consider using inst.rename_channels to match the montage nomenclature, or inst.set_channel_types if these are not EEG channels, or use the on_missing parameter if the channel positions are allowed to be unknown in your analyses.",
	"stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File d:\\BELGELER 2021-2022\\AybĂĽke\\tĂĽbitak\\software\\icca.py:15
     13 info = mne.create_info(ch_names, 256, ch_types=[\"eeg\"]*23 )
     14 raw = mne.io.RawArray(data, info)
---> 15 raw.set_montage(\"standard_1020\")
     17 raw2 = raw.copy()
     18 raw2.filter(l_freq=1, h_freq=None)

File <decorator-gen-22>:12, in set_montage(self, montage, match_case, match_alias, on_missing, verbose)

File ~\\AppData\\Roaming\\Python\\Python311\\site-packages\\mne\\_fiff\\meas_info.py:422, in MontageMixin.set_montage(self, montage, match_case, match_alias, on_missing, verbose)
    419 from ..channels.montage import _set_montage
    421 info = self if isinstance(self, Info) else self.info
--> 422 _set_montage(info, montage, match_case, match_alias, on_missing)
    423 return self

File ~\\AppData\\Roaming\\Python\\Python311\\site-packages\\mne\\channels\\montage.py:1250, in _set_montage(***failed resolving arguments***)
   1239 are_is = \"are\" if pl else \"is\"
   1240 missing_coord_msg = (
   1241     f\"DigMontage is only a subset of info. There {are_is} \"
   1242     f\"{len(missing)} channel position{pl} not present in the \"
   (...)
   1248     f\"position{pl} {are_is} allowed to be unknown in your analyses.\"
   1249 )
-> 1250 _on_missing(on_missing, missing_coord_msg)
   1252 # set ch coordinates and names from digmontage or nan coords
   1253 for ii in missing:

File ~\\AppData\\Roaming\\Python\\Python311\\site-packages\\mne\\utils\\check.py:1191, in _on_missing(on_missing, msg, name, error_klass)
   1189 on_missing = \"warn\" if on_missing == \"warning\" else on_missing
   1190 if on_missing == \"raise\":
-> 1191     raise error_klass(msg)
   1192 elif on_missing == \"warn\":
   1193     warn(msg)

ValueError: DigMontage is only a subset of info. There are 23 channel positions not present in the DigMontage. The channels missing from the montage are:

['FP1-F7', 'F7-T7', 'T7-P7', 'P7-O1', 'FP1-F3', 'F3-C3', 'C3-P3', 'P3-O1', 'FP2-F4', 'F4-C4', 'C4-P4', 'P4-O2', 'FP2-F8', 'F8-T8', 'T8-P8-0', 'P8-O2', 'FZ-CZ', 'CZ-PZ', 'P7-T7', 'T7-FT9', 'FT9-FT10', 'FT10-T8', 'T8-P8-1'].

Consider using inst.rename_channels to match the montage nomenclature, or inst.set_channel_types if these are not EEG channels, or use the on_missing parameter if the channel positions are allowed to be unknown in your analyses."
}

Here is my code so far:

# %%
import mne
import os
import matplotlib.pyplot as plt

data_path = "/datapath/software/dataset/chb01/chb01_03.edf"

eeg = mne.io.read_raw_edf (data_path, preload=True)
ch_names = eeg.ch_names

data = eeg.get_data()

info = mne.create_info(ch_names, 256, ch_types=["eeg"]*23 )
raw = mne.io.RawArray(data, info)
raw.set_montage("standard_1020")

raw2 = raw.copy()
raw2.filter(l_freq=1, h_freq=None)

ica = mne.preprocessing.ICA(
    method = "picard",
    fit_params = {"extended": True, "ortho": False},
    random_state = 1    
)

ica.fit(raw2)

ica.plot_components(inst=raw2, picks = range(23))

Yes, as the error message suggests, you should rename your channels so that they match names in the 10-20 system. However, I’m not sure if this is possible, because you seem to have bipolar channels like FP1–F7. You don’t need a montage for ICA, but it is required if you want to plot IC maps (e.g. to identify and mark ocular components). However, in this case I’m not sure if bipolar channels are suitable, because usually these topomaps are computed with a single reference (or average reference).

1 Like

Well, I have tried to rename the channels with the following code but now I am facing the problem of overlapping channels. Do you know how to handle overlapping channels?

# %%
import mne
import matplotlib.pyplot as plt

data_path = "/BELGELER 2021-2022/AybĂĽke/tĂĽbitak/software/dataset/chb01/chb01_03.edf"

# Load the EEG data
eeg = mne.io.read_raw_edf(data_path, preload=True)

# Define a mapping that avoids duplicate target names by only keeping unique 10-20 equivalents
ch_mapping = {
    'FP1-F7': 'Fp1', 'F7-T7': 'F7', 'T7-P7': 'T7', 'P7-O1': 'P7', 
    'F3-C3': 'F3', 'C3-P3': 'C3', 'P3-O1': 'P3', 
    'FP2-F4': 'Fp2', 'F4-C4': 'F4', 'C4-P4': 'C4', 'P4-O2': 'P4', 
    'FZ-CZ': 'Fz', 'CZ-PZ': 'Cz'
}

# Rename the channels in the raw data
eeg.rename_channels(ch_mapping)

# Apply the standard 10-20 montage
eeg.set_montage("standard_1020", on_missing="ignore")

# Now continue with your ICA setup
eeg.filter(l_freq=1, h_freq=None)

ica = mne.preprocessing.ICA(
    method="picard",
    fit_params={"extended": True, "ortho": False},
    random_state=1
)
ica.fit(eeg)

# Plot ICA components to identify artifacts
ica.plot_components(inst=eeg, picks=range(ica.n_components_))
plt.show()

You could rename it to the channel that lies exactly between the two channels, e.g., FP1–F7 could be AF7, and so on.

However, it is generally not recommended to use bipolar references if you need topomaps, see here.

1 Like

I understand but I need to preprocess this dataset and as I understand for an effective ICA, I need those topomaps. What would you suggest me to do? BTW, I have successfully plotted the topomap with the following code:


# %%
import mne
import matplotlib.pyplot as plt

data_path = "/datapath/software/dataset/chb01/chb01_03.edf"

# Load the EEG data
eeg = mne.io.read_raw_edf(data_path, preload=True)

# Define a mapping that avoids duplicate target names by only keeping unique 10-20 equivalents
ch_mapping = {
    'FP1-F7': 'Fp1', 'F7-T7': 'F7', 'T7-P7': 'T7', 'P7-O1': 'P7', 
    'F3-C3': 'F3', 'C3-P3': 'C3', 'P3-O1': 'P3', 
    'FP2-F4': 'Fp2', 'F4-C4': 'F4', 'C4-P4': 'C4', 'P4-O2': 'P4', 
    'FZ-CZ': 'Fz', 'CZ-PZ': 'Cz', 'FP2-F8': 'P5', 'F8-T8': 'F8', 
    'T8-P8-0': 'T8', 'P8-O2': 'P8', 'FP1-F3': 'P2', 'P7-T7': 'P9',
    'T7-FT9': 'T9', 'FT9-FT10': 'FT9', 'FT10-T8': 'FT10', 'T8-P8-1': 'TP8'
}

# Rename the channels in the raw data
eeg.rename_channels(ch_mapping)

# Apply the standard 10-20 montage
eeg.set_montage("standard_1020", on_missing="ignore")

# Now continue with your ICA setup
eeg.filter(l_freq=1, h_freq=None)

ica = mne.preprocessing.ICA(
    method="picard",
    fit_params={"extended": True, "ortho": False},
    random_state=1
)
ica.fit(eeg)

# Plot ICA components to identify artifacts
ica.plot_components(inst=eeg, picks=range(ica.n_components_))
plt.show()

and here is the output:

As an expert can you please tell me if this is okay?

As I mentioned, you can run ICA on bipolar data, but the EEGLAB wiki clearly mentions that topomaps (scalp maps) require “that all channels use either the same common reference or the same average reference”. Indeed, the topomaps you are showing do not contain the typical eye movement and blink components. To be honest, I’m not sure how to proceed here. Maybe removing ocular artifacts is not important for your analysis?