Preprocessing cropped EEG data - best practices

:question: If you have a question or issue with MNE-Python, please include the following info:

  • MNE version: e.g. 1.7.0
  • operating system: macOS 14

Hello MNE community,

I have questions about best practices for cropping data as part of pre-processing in conjunction with AutoReject, ICA.

The EEG data I am working with included annotations for pairs of stimuli_start and stimuli_stop, repeating over 100 times in the data. Outside of these annotations, participants were able to relax, move about, and even remove the EEG device. Given these out-of-stimuli segments have arguably very poor quality data of no value, as part of pre-processing we thought it would be good to crop them out and stitch the events back-to-back together. Here is an example of the cropping code

import mne

# assume data is read as mne.io.raw object as variable raw

# Extract events from annotations (stimuli_start=2, stimuli_end=3,
# experiment_end=4--> occurs only once at the end)
events, _ = mne.events_from_annotations(raw, event_id={'2': 2, '3': 3, '4': 4})

# Create an event list containing only stimulus start-end events
start_event_code = 2
end_event_code = 3

start_end_events = events[(events[:, 2] == start_event_code) | (events[:, 2] == end_event_code)]

# Iterate through stimulus start-end event pairs and extract data segments

# Define a list to store the cropped epochs
stimulus_raw = []
sampling_rate = 250

# Define a placeholder 
start_idx = None
for ev in start_end_events:
    # Mark the start and end of each stimulus
    if ev[2] == start_event_code:
        start_idx = ev[0]
    elif ev[2] == end_event_code:
        end_idx = ev[0]

        # Create an epoch with the defined start and end time
        tmin = math.floor(start_idx / sampling_rate * 10000) / 10000 # maybe need to round down to 3(?) decimal places to ensure event is captured when converting from sample to time
        tmax = end_idx / sampling_rate # no need to round
        epoch = raw_dropCh_montage.copy().crop(
            tmin=tmin,
            tmax=tmax
            )
        stimulus_raw.append(epoch)

# Concatenate the epoch list into a single Epochs object
raw_cropped = mne.concatenate_raws(stimulus_raw)

My questions are

  1. What would be reasons to avoid cropping the data as such before subsequent pre-processing steps?
  2. Will the steps of the typical pre-processing steps AutoReject and ICA done on MNE see sub-optimal performances as a result of this cropping? Are there better approaches to deal with this situation?
  3. If the cropping approach is sound but the code can be more efficient, suggestions are welcome! (It is quite computationally heavy at the moment)

Thank you!
Lek Hong

Hello @lek-hong and welcome to the forum!

Excluding those noisy periods from processing is the right thing to do. ICA etc. will perform better that way.

You may want to take a look at annotate_break(), which was developed to help in cases like yours:

https://mne.tools/stable/auto_tutorials/preprocessing/20_rejecting_bad_data.html#detecting-and-annotating-breaks

Note that you can run ICA on epochs too, and autoreject exclusively works on epochs.

Best wishes,
Richard

1 Like

Thank you Richard for the guidance!

annotate_break() is interesting. Since the stimuli are of various durations, I can programmatically define the break durations via the timestamps of adjacent annotations.

But might it be better to just exclude the breaks permanently from the data via cropping rather than just annotate them as breaks?

Yup, will run ICA and AutoReject on epochs generated from the cropped data. :grinning:

For MNE it’s the same really … Any data with an annotation starting with “bad” will be ignored automatically. No need to crop the data!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.