Questions about building a consistent EEG preprocessing pipeline (resting-state)

:red_question_mark: If you have a question or issue with MNE-Python, please include the following info:

  • MNE version: 1.11.0
  • operating system: Ubuntu 24.04.1 LTS

Hello MNE forum,

I am currently preprocessing resting-state EEG data recorded from patients with Parkinson’s disease.

Each subject performed a 2-minute resting-state recording. At this stage, I have loaded all raw datasets and I am trying to verify whether the recordings were properly acquired and whether the EEG data are usable.

In a previous post (Automatic EEG quality check & ICA for blink removal in 19-channel dry EEG), I asked about automatic EEG quality checking. After reading the replies, I also searched through many posts on the MNE forum as well as tutorials, examples, and API documentation. However, I am still not fully confident about what would be the most appropriate consistent preprocessing pipeline for my dataset.

My goal is to build a consistent preprocessing pipeline that can be applied to all subjects in the same way.

Currently, I am considering the following three preprocessing pipelines.

Pipeline 1

Load EDF
→ Crop & filtering (0.5-50 Hz band-pass + notch)
→ CAR reference
→ ICA (fit with 1–100 Hz band-pass filtered data)
→ RANSAC (global bad channel detection)
→ interpolate_bads()
→ CAR reference again
→ Epoching
→ AutoReject (local)

Pipeline 2

Load EDF
→ Crop & notch filtering
→ CAR reference
→ Copy data & apply 1 Hz high-pass filtering
→ Epoching
→ AutoReject (local)
→ ICA fit
→ ICA apply
→ AutoReject (local)
→ Final band-pass filtering (0.5-50Hz)

Pipeline 3

Load
→ Notch + band-pass filtering (0.5–50 Hz)
→ Visual inspection to annotate large artifacts
→ Remove bad segments
→ ICA
→ Re-reference (CAR)
→ Fixed-length epoching
→ AutoReject to remove bad epochs

My questions

Q1.
Would the preprocessing in Pipeline 1 be considered too aggressive or overly complicated?

The motivation behind this pipeline is that I am not an EEG expert, so I tried to combine several existing tools (e.g., ICLabel, RANSAC, AutoReject) to create a semi-automatic and consistent preprocessing pipeline that can be applied uniformly across subjects.

Q2.
In Pipeline 3, when visually inspecting band-pass filtered continuous EEG data and marking bad segments:

Would it be better to first split the data into short fixed-length epochs (e.g., 1–2 seconds) and then drop bad epochs, rather than annotating artifacts directly in the continuous data?

Q3.
If AutoReject (local) is applied at the epoch level, would using RANSAC for bad channel detection followed by mne.interpolate_bads() still be meaningful?

Or would this be somewhat redundant?

I would greatly appreciate any advice on how to design a robust and consistent preprocessing pipeline for this type of dataset.

I have spent the last few days reading many MNE forum discussions and preprocessing examples, but I am still unsure which preprocessing strategy would be the most appropriate.

Thank you very much for your time and help!

@Ananta Have you looked into exiting automatic processing pipeline so that you don’t have to be saddled with all the decisions like steps to run, order, etc?

Hello @scott-huberty, thank you for your suggestion!

I have looked into several automated approaches, including the MNE-BIDS pipeline, as well as tools like ICLabel and AutoReject.

Based on those resources, I am currently building a preprocessing pipeline where I first run AutoReject, then pass the remaining epochs to ICA, and finally run AutoReject again after applying ICA. Among the pipelines I mentioned earlier, this would be closest to Pipeline 2.

However, I have encountered one issue that I am unsure about. When applying ICA, some datasets are decomposed into a very small number of components (e.g., 3 or 5 components) even though the recordings contain 19 channels.

I am wondering whether this might indicate poor data quality, and whether such datasets should be excluded from analysis.

When I visually inspect these recordings, the signal amplitudes often look somewhat abnormal, which makes me suspect that the data quality may indeed be problematic.

Thanks again for your advice. I really appreciate it!

See mne.preprocessing.ICA . The default value for n_components is None, which means that MNE will choose the number of components for you (maybe this is why you are getting 2 or 3 components for some datasets). What that means with respect to the quality of your data or ICA decomposition, I am not sure.

Just make sure that You/Autoreject are not interpolating channels prior to running ICA. This reduces the rank of the data, which can impact the ability of the decomposition to identify independent components.