Order to pre-process data

Hi all,

We are actually working on a bench of eeg data, and we are wondering about the best way to pre-process eeg data.

Actually we are doing :

  1. Identify bad channels
  2. Interpolate them
  3. Set the average as reference
  4. ICA to remove blink pattern
  5. Remove non-brain channels

Our questions are :
-Why interpolating in step 2 instead of dropping bad channels ? (this must compromise the average…?)
-Why not begin by removing non-brain channels? (this must compromise the ICA and the average we are interested in…?)

Thanks a lot in advance !


What data type are you working with? I assume EEG?

In MNE, I’d suggest to add bad channels to info['bads'] and not interpolating until you really need to. This is of course to some extent a personal preference, but in my experience, interpolation is only really needed when you want to calculate e.g. a grand average and need to ensure all participants have the same set of channels.

You should not set the average reference before running ICA. I’ve seen cases where this would “smear” artifacts across all channels. So my advice is to first try to remove all artifacts (which includes use of ICA), and only after that switch to an average reference.

MNE is aware of the fact that some channels contain brain data (e.g., magnetometers, gradiometers, EEG) while others don’t (e.g., EOG, ECG) and will take this into consideration. Applying an average reference to EEG data, for example, will of course not include EOG channels in this calculation.

I hope this helped a bit!


1 Like

Nice very clear ! Thanks a lot for your help :slight_smile:

1 Like

Just another question: Our set up is made with 129 electrodes but there is no channels listed as EOG (cf picture) even if we saw that ‘E8’ ,‘E14’, ‘E21’, ‘E25’, ‘E126’, ‘E126’, ‘E127’ ‘E128’ are the channels recording eye movements.

  1. Would you suggest to label those channels as EOG ones from the beginning ? (in order to exclude them from the average calculation and so on)

  2. Or would you even recommend to remove all the non-brain electrodes (including EOG but also the jaw and ear ones) before the average referencing ?

Thanks again for your help !



these sensors do contain brain signal too – it’s just so weak in comparison to the EOG artifacts that it’s often “hidden”.

The goal here should be, therefore, to remove those artifacts to recover the underlying brain signal.

Since you’re using ICA, you can do exactly that: use ICA to first extract the artifact-related component(s), and then remove those components from your data.

MNE can very well use EEG channels as “virtual” or “simulated” EOG channels. For example, I commonly use Fp1 and Fp2 for EOG artifact detection. All that MNE will do, then, is is run its peak finder mechanics on those channels. You don’t need to change the channel type or remove the channels from your data!

Please see this section of the ICA tutorial for more information:


Okay !

Another question about the ICA we don’t know how severe we must be. In your opinion what is the maximum number of components to exclude ?

For instance, in the picture below would you exclude the 7 components underlined ?

We are selecting ICA components manually because we don’t have so much data and its okay to do it manually.

Thanks again for your useful answers !!

I’m concerned the ICA decomposition didn’t work well with your particular data.

Components 000, 001, 002, 004, and potentially also 005, 006, 009, 011, 012, 013, and 014 seem to have captured very similar artifacts. I wouldn’t trust this result, it’s rather unexpected.

How exactly did you pre-process the data you fed into ICA? How did you fit ICA?

Is that one channel in the right frontal/temporal (?) area maybe broken and should be marked as bad?

We’ll need much more information to be of help at this point.

The name of signal we are working on at this point is “eeg_cropped_removed” because we cropped the begining and/or the end if needed + we removed bad channels

Then following your advice we apply the ICA before averaging:

ica = ICA(n_components=15, max_iter=‘auto’, random_state=97)

Running this command return:

Fitting ICA to data using 119 channels (please be patient, this may take a while) Selecting by number: 15 components Fitting ICA took 5.3s.

Method fastica
Fit 38 iterations on raw data (97501 samples)
ICA components 15
Explained variance 99.9 %
Available PCA components 119
Channel types eeg
ICA components marked for exclusion

And then running these 2 cells:
ica.plot_sources(eeg_cropped_removed, show_scrollbars=False, stop = 60)

Give the plot I attached the previous message.

Do you need more information, I would be very pleased to give you more if you think you can help us to elucidate what’s going on.

Thanks again and again !

Hello, so this sparks some new questions.

First, what is the reason you’re using raw data here and not epochs? Is this resting-state data?

Secondly, why are you limiting the number of ICA components to just 15? This seems strangely low to me, considering that you appear to have more than 100 good channels.

In EEG, I would always start with a number of components that is equal to the number of good channels, minus number of interpolated channels; and reduce this number by one more if using an average reference (but since you didn’t set one before ICA, you can ignore this)

I’m surprised that

  • the method converges so quickly
  • and yet, explains almost the entire variance

with just 15 components

I’d think something is very wrong there, and the first place I’d look into is that one channel on the right side of the head.

Last question, did you high-pass filter your data before fitting ICA?


  1. Yes it’s resting-state data collecting on patients with a severe traumatic brain injury, that’s why sometimes it’s a bad quality signal (there are artifacts due to machines, patients are moving a lot… So we have to drop some bad channels and bad epochs)
    We are making epochs only at the end of the pre-processing to keep only the ones that’s look good. Do you think we should do this earlier (like before the ICA) ?

  2. We limited the number of components to 15 because we followed the tutorial you mentioned earlier: Repairing artifacts with ICA — MNE 1.0.2 documentation
    What would you recommend regarding our situation with 129 EGI system? The goal is to keep the signal as pure as possible but clean out main artifacts

  3. We filtered our data following the tutorial cited above, so we apply: low and high pass (l_freq=1, h_freq = 55) + notch (freqs= 60)

Hi Richard,

We changed the way of pre processing according to your advice:

  1. Drop bad channels
  2. Make epochs (and drop bad ones)
  3. Apply ICA

Here is what we obtained

And then after excluding components 1, 7, 10, 12, 14, we have these signals (before and after ICA)

And then the last steps:
4) Average Reference the data
5) Remove Non-Brain Electrodes

What do you think about it ?
Do you think we still need to do the ICA with more components ?

Thanks again !