I have a dataset of raw EEG data that I want to clean before analyzing. Without going too much into details, the dataset includes multiple participants, with several sessions at different date/time for each participant.
Each session includes several recordings in multiple -raw.fif
files among which:
1 recording with:
- 1 minute during which the participant was asked to blink naturally, but at an increased rate. The goal was to create additional blink EOG epochs to work with.
- Auditory stimuli: alternation of 1s rest/1s audio stimuli x75.
Several recordings of neurofeedback with an alternation of rest/regulation phases. The targets are alpha and delta waves.
Iām looking for the best way to clean the dataset, semi-automatically. I will probably make a script to automate part of the pipeline, while I will input some parameters based on observation for every participant/session/file. I will expose my current pipeline, all the feedback, and possible improvements is welcome
The recordings include 63 EEG channels referenced to CPz
, 1 EOG channel, 1 ECG channel.
- I start by adding back
CPz
to have the 64 channels stored; and I set the montage (standard 10/20). - Despite recording inside a shielded EEG booth, and with an amplifier/laptop on battery, the powerline noise is present on the dataset (havenāt figure out how I was picking itā¦). As it impacts all EEG channels, I rereference with a CAR, which removes the powerline noise on the EEG channels. I apply a notch for the EOG and ECG channels to remove the powerline noise.
At this point, I know I want to:
- filter with a bandpass
- annotate bad segments of data interactively
- add my events as annotations programmatically
- remove blink/EOG artifacts
- remove heartbeat artifacts
For the bandpass filtering, it is obvious that I need to apply a high-pass filter to remove the drift. But should I start by annotating the bad segments of data without applying a low-pass filter, or should I apply the filter before? As bad segments of data are usually composed of high-frequencies, I am not sure if masking them with filtering is the correct call.
Adding the event is trivial. Only one question for the bad segments: the default naming is BAD_
and you can add whatever suffix you want. Is it possible to remove the underscore, and keep the name BAD
to merge all kind of bad segments together while retaining the āannotation-awarenessā of the MNE functions? Or is the underscore required to avoid using bad data spans with MNe functions?
Finally, the dataset is obviously artifacted by eye movements. But more surprisingly, for some participants, itās also artifacted by heartbeats. I want to remove both, probably using an ICA.
For the online neurofeedback paradigm, I used SSP projection to remove blink artifacts from data. But I feel like an ICA will yield better results. I intend to use the excellent tutorial on repairing artifacts with ICA.
As per the discussion in this post, and I target the delta waves; I will fit the ICA on a 1 Hz high-passed dataset; and then apply the ICA on a lower 0.1 Hz high-passed dataset.
For all those steps, I will not mark any channel as bad in raw.info['bads']
. First question, should I?
Should I do it at the end?
Should I try to interpolate bad channels?
And finally, what about autoreject
, is it worth trying to repair channels with?
I know this is a lot of information, thanks to those who read it; and thanks for the guidelines!