Preprocessing pipeline tips and tricks for EEG data

I have a dataset of raw EEG data that I want to clean before analyzing. Without going too much into details, the dataset includes multiple participants, with several sessions at different date/time for each participant.

Each session includes several recordings in multiple -raw.fif files among which:
1 recording with:

  • 1 minute during which the participant was asked to blink naturally, but at an increased rate. The goal was to create additional blink EOG epochs to work with.
  • Auditory stimuli: alternation of 1s rest/1s audio stimuli x75.
    Several recordings of neurofeedback with an alternation of rest/regulation phases. The targets are alpha and delta waves.

I’m looking for the best way to clean the dataset, semi-automatically. I will probably make a script to automate part of the pipeline, while I will input some parameters based on observation for every participant/session/file. I will expose my current pipeline, all the feedback, and possible improvements is welcome :slight_smile:

The recordings include 63 EEG channels referenced to CPz, 1 EOG channel, 1 ECG channel.

  • I start by adding back CPz to have the 64 channels stored; and I set the montage (standard 10/20).
  • Despite recording inside a shielded EEG booth, and with an amplifier/laptop on battery, the powerline noise is present on the dataset (haven’t figure out how I was picking it…). As it impacts all EEG channels, I rereference with a CAR, which removes the powerline noise on the EEG channels. I apply a notch for the EOG and ECG channels to remove the powerline noise.

At this point, I know I want to:

  • filter with a bandpass
  • annotate bad segments of data interactively
  • add my events as annotations programmatically
  • remove blink/EOG artifacts
  • remove heartbeat artifacts

For the bandpass filtering, it is obvious that I need to apply a high-pass filter to remove the drift. But should I start by annotating the bad segments of data without applying a low-pass filter, or should I apply the filter before? As bad segments of data are usually composed of high-frequencies, I am not sure if masking them with filtering is the correct call.

Adding the event is trivial. Only one question for the bad segments: the default naming is BAD_ and you can add whatever suffix you want. Is it possible to remove the underscore, and keep the name BAD to merge all kind of bad segments together while retaining the ‘annotation-awareness’ of the MNE functions? Or is the underscore required to avoid using bad data spans with MNe functions?

Finally, the dataset is obviously artifacted by eye movements. But more surprisingly, for some participants, it’s also artifacted by heartbeats. I want to remove both, probably using an ICA.
For the online neurofeedback paradigm, I used SSP projection to remove blink artifacts from data. But I feel like an ICA will yield better results. I intend to use the excellent tutorial on repairing artifacts with ICA.
As per the discussion in this post, and I target the delta waves; I will fit the ICA on a 1 Hz high-passed dataset; and then apply the ICA on a lower 0.1 Hz high-passed dataset.

For all those steps, I will not mark any channel as bad in['bads']. First question, should I?
Should I do it at the end?
Should I try to interpolate bad channels?
And finally, what about autoreject, is it worth trying to repair channels with?

I know this is a lot of information, thanks to those who read it; and thanks for the guidelines!

1 Like

Hello @MathieuSch-101!

I was wondering whether you’ve considered using the MNE-BIDS-Pipeline? It should offer everything you need, and was created to follow good practices and yield meaningful results with real-world data (that is, the limited set of real-world data it has been tested with so far :slight_smile: )

Sounds fair.

Lights, perhaps? Or an external monitor? Or the shielding of the chamber isn’t working properly?

Does it, though? I’d be surprised if that worked sufficiently well (but hey, I do like surprises!) From my experience, simply applying an average-referencing scheme does not sufficiently remove line noise. As you’re targeting Delta frequencies anyway, why not simply apply a low-pass filter with an upper limit of, say, 40 Hz?

This should not be necessary, as mne.preprocessing.create_ecg_epochs() and mne.preprocessing.create_eog_epochs() apply filters that default to upper frequencies far lower than line noise (16 and 10 Hz, respectively).

Personally, I tend to ignore high-frequency noise when first inspecting the data; for marking bad segments, I’d focus on extreme signal amplitude jumps and the like, as I’ll typically apply a low-pass filter at a later stage anyway. I suppose other users may have different strategies. @cbrnr, how do you typically approach this?

To my knowledge, all MNE functions that special-case “bad” annotations should happily work without the underscore too. If you come across a situation where this doesn’t work, it’s likely to be a bug.

YES! I’d always mark problematic channels as “bad” as early as possible.

It depends on your specific data and analysis approach. You said that each session produced multiple files. Often times, sessions are split into blocks or runs, which all are stored in separate files. In EEG, the more a session progresses, the more sensors tend to get lost. This means that you may have different sets of bad channels in the first block than in the last block of a session. If a channel has been bad for only one out of, say, 6 blocks, you may want to interpolate it in the “bad” block. If, on the other hand, it was problematic throughout the entire experiment, you should probably just mark it as bad and not bother interpolating (but see below). You may want to take a look at the relatively new function mne.preprocessing.equalize_bads(), which makes this procedure easy and ensures all of your data instances have the same set of bads, which is required if you intend to concatenate them.

Aside from this, I’d only interpolate when calculating the grand average; but mne.grand_average() already takes care of this by default (see the interpolate_bads parameter).

Autoreject operates on epochs, and we found it useful to use its “local” variant before and after ICA. See also the respective example, Preprocessing workflow with autoreject and ICA — autoreject 0.3.dev0 documentation (sorry, I couldn’t find another deployed version of the “development” docs of autoreject@mainakjas, is there a proper deployment somewhere?)

Best wishes,


I find your pipeline very reasonable, and I agree with almost everything @richard has said (the main difference is that I recommend interpolating bad channels no matter what). Here are some comments:

  • It is very difficult to record EEG without any line noise, no matter how good you shield the room. Average reference usually attenuates line noise quite a bit, but not completely. All of this doesn’t matter in most cases though, because typically we are interested in frequencies below 35Hz. For visual inspection, it can be useful to apply a notch or a lowpass filter. If you really want to remove line noise, a better option than a simple notch (which removes everything) might be mne.filter.notch_filter() with method='spectrum_fit'.

  • Regarding filtering before or after annotating, I usually apply only a high-pass filter (0.1Hz) to remove offsets and drift. Then I mark bad channels (i.e. channels that have extremely low or high PSD compared to normal channels). After that, I re-reference to the average of all good channels, interpolate the bad channels, and manually mark artifact segments (mainly muscle). I find interpolating bad channels is useful later when you do grand averages and/or group statistics. Of course interpolating doesn’t add any information, but it makes many subsequent steps much easier, so I strongly recommend it. After marking bad segments I run ICA (on the data high-pass filtered with 1Hz only using data segments of interest), identify ocular ICs, and remove these ICs from the data. I will make my pipeline available at (right now it’s still private). I think it is pretty similar to the MNE-BIDS pipeline, but I haven’t taken a look at that yet.


Thanks for all the tips. I’ll have a look at MNE bids; I actually thought it was just a layer to support BIDS files. Anyway, this is more than enough information to keep me going in the correct diretion! Thanks!

It is a package to allow reading and writing BIDS data with the help of MNE-Python. The MNE-BIDS-Pipeline (which I often just call “BIDS Pipeline”) is a different project that uses MNE-BIDS to read BIDS-fortmatted datasets.