Reject and flat dropping a different number of epochs each time it is run?

Iā€™m on MNE version 1.03

Iā€™ve just noticed that each time I use reject and flat to reject bad epochs, I am getting a slightly different number of remaining epochs despite identical settings. Seems to be happening both when reject and flat are used in epoch creation or in drop_bad.

Additionally, in the output, it also is recording using channels that were not included in picks (i.e., channels it is not supposed to ingest or use at all in the epoch object) to make those drop decisions.

Is there something about the way those are implemented that would generate different results each time a cell is run? Has anyone else noticed a similar result?

it should not happen. Once you have dropped bad epochs it should have no effect to recall the drop function.

are you sure you donā€™t modify epochs data inplace?

Alex

1 Like

Could you provide a small example as a code snippet?

Apologies for the long delay in responding to this thread - I had a very time consuming project going, but now I have time to properly circle back here.

Iā€™m using the dataset from this study to try and replicate and improve on their classification results: Individually Adapted Imagery Improves Brain-Computer Interface Performance in End-Users with Disability - PMC

The full dataset of EEG recordings collected in the study can be found on this page, listed under ā€œ13. Individual imagery (004-2015ā€™ā€: Data sets - BNCI Horizon 2020

I set up a gridsearch to find the percentage of trials that are dropped with different flat and reject settings - the full code is below. First, Iā€™ll demonstrate the issue.

When running the gridsearch, I noticed that the results change on some iterations when nothing is changing in the code (i.e., a different number of trials are being dropped when the same code is run). For example, here are two screenshots - in both the ā€˜flatā€™ rejection setting used on each row is shown in the middle column, along with the percentage of the trials that are being dropped as a result in the right column:

image

image

Here is the code that performs the gridsearch:

#Create list of epoch drop filters
reject_options_1 = [None]
reject_options_2 = [None, 
                  {'eeg': 40}, {'eeg': 50}, {'eeg': 60}, 
                  {'eeg': 70}, {'eeg': 80}, {'eeg': 90}, 
                  {'eeg': 105}, {'eeg': 140}, {'eeg': 175},
                   {'eeg': 135}, {'eeg': 145}, {'eeg': 150},
                   {'eeg': 155}, {'eeg': 160}, {'eeg': 167},
                   {'eeg': 146}, {'eeg': 147}, 
                   {'eeg': 130}, {'eeg': 125}, {'eeg': 120},
                   {'eeg': 115}, {'eeg': 110},
                   {'eeg': 180}, {'eeg': 185}, {'eeg': 190},
                   {'eeg': 143}, {'eeg': 110}, {'eeg': 115},
                   {'eeg': 120}, {'eeg': 125}, {'eeg': 130},
                   {'eeg': 83}, {'eeg': 86}, {'eeg': 78}, 
                   {'eeg': 100}, {'eeg': 95}, 
                   {'eeg': 210}, {'eeg': 183}, {'eeg': 200},
                   {'eeg': 205}]
flat_options_1 = [None]
flat_options_2 = [None, 
                {'eeg': 3}, {'eeg': 6}, {'eeg': 9}, 
                {'eeg': 11}, {'eeg': 13}, {'eeg': 15}, 
                {'eeg': 17}, {'eeg': 19}, {'eeg': 21},
                 {'eeg': 25}, {'eeg': 30}, {'eeg': 35},
                 {'eeg': 36}, {'eeg': 31}, {'eeg': 33}, 
                 {'eeg': 34}, {'eeg': 30.5}, {'eeg': 31.5},
                 {'eeg': 16}, {'eeg': 18}, {'eeg': 20},
                 {'eeg': 17.3}, {'eeg': 17.6}, {'eeg': 18.5},
                 {'eeg': 18.75}, {'eeg': 19.5}, {'eeg': 20.5},
                 {'eeg': 22}, {'eeg': 23}, {'eeg': 24}, 
                 {'eeg': 19.75},
                 {'eeg': 24.5}, {'eeg': 26}, {'eeg': 27},
                 {'eeg': 28}, {'eeg': 29}, 
                 {'eeg': 25.5}, {'eeg': 29.5}, {'eeg': 26.5},
                 {'eeg': 37}, {'eeg': 38}, {'eeg': 39},
                 {'eeg': 40}, {'eeg': 41}, {'eeg': 42}]

#Create dataframe - tests flat & reject filters independently
rejection_settings_df = pd.concat((pd.DataFrame(itertools.product(reject_options_1,
                                                                 flat_options_2), 
                                               columns=['reject', 'flat']),
                                  pd.DataFrame(itertools.product(reject_options_2,
                                                                 flat_options_1), 
                                               columns=['reject', 'flat'])),
                                 ignore_index=True)

#reindex
rejection_settings_df = rejection_settings_df.reindex(columns=['reject', 'flat'] + 
                                                      list(y_dict.keys()))

#Load raw dict with mne raw objects
raw_dict = {}
for key, value in data_dict.items():
    raw_dict[key] = mne.io.RawArray(value.T, info, verbose=0)

#Filter raw dict
for key, value in raw_dict.items():
    value.filter(l_freq=None, 
                 h_freq=40, 
                 method='fir', phase='zero', verbose=0)
            
#Compute percentage of trials dropped for each setting
for row in range(rejection_settings_df.shape[0]): 
    epoch_dict = {}
    for key, value in raw_dict.items():
        epoch_dict[key] = mne.Epochs(value, events=event_dict[key], 
                                     event_id=events_explained, 
                                     tmin=-3, tmax=4.5, 
                                     baseline=None,
                                     preload=True,
                                     picks=[ch for ch in ch_names if 
                                            ch not in ['AFz', 'F7', 'F8']],
                                     reject=rejection_settings_df.reject[row],
                                     flat=rejection_settings_df.flat[row],
                                     reject_tmin=1,
                                     reject_tmax=4.5,
                                     verbose=0)
    perc_trials_dropped_dict = {}
    for key, value in y_dict.items():
        dropped = value.shape[0] - epoch_dict[key].get_data().shape[0]
        drop_percentage = dropped / value.shape[0]
        rejection_settings_df.at[row, key] = drop_percentage

And then the code to show the results shown in the screenshots:

#view results, change subject to dig into data
subject = 'sub_C'
#Set exclude to ignore one of two filter types: 'reject' or 'flat'
exclude = 'reject'

(rejection_settings_df[['reject', 'flat', 
                        f'{subject}_sesh_1']].
 loc[(rejection_settings_df[f'{subject}_sesh_1'] < 0.2) & 
     (rejection_settings_df[f'{subject}_sesh_1'] > 0) &
 (rejection_settings_df[exclude].isna())].sort_values([f'{subject}_sesh_1']))

Really appreciate any insight, and happy to help diagnose further, just let me know what would be helpful! Going to take a look at the codebase that implements the flat and reject features as well and see if I can come up with any hypotheses.

Not related to the actual question, but: the rejection values you specify are extremely large amplitudes. MNE-Python expects Volts. If your signals actually exceed several Volts ā€¦ something is already going wrong during data import / reading (weā€™re talking 5 to 6 orders of magnitude derivation from what one would expect)

The code you shared cannot be run, making it difficult or impossible for us to reproduce the issue. Could you share a minimal code snippet that demonstrates the issue and can directly be run? Thanks.

For sure! The dataset uploaded by the original authors takes a fair amount of code to unpack (and was clearly saved after upscaling by several orders of magnitude, as you point out).

I did my best to simplify the code and downloads required to reproduce. Here we go:

I re-uploaded the .mat file for just a single subject to WeTransfer so it can be downloaded without you having to navigate the BNCI website. I picked subject C just because that was the example Iā€™d already shown.

Here is that download link from WeTransfer. Save the .mat file into a subfolder called /data from wherever youā€™ll be running your code or creating your notebook: WeTransfer - Send Large Files & Share Photos Online - Up to 2GB Free

Then I simplified down to three pieces of code. The first piece unpacks the .mat file and loads into dictionaries to work with going forward.

The second piece creates MNE raw objects and filters them.

The third piece creates epoch objects using a ā€˜flatā€™ rejection setting.

I suggest experimenting with running blocks 2 and 3 repeatedly, and then just 3 repeatedly - I believe the differences are actually arising in the filtering process, not the epoch creation / rejection process (see post below).

Code 1 to unpack the .mat file:

#Create function to ingest subject data
#Returns session 1 data, session 1 trial times, session 1 task list per trial,
#Session 2 data, session 2 trial times, session 2 task list per trial
def data_ingester(filename):
    '''This function ingests all the data from one of the nine data files that came in this dataset.\n
    Each data file contains all the data for one of the nine subjects.\n
    For the selected data file, it returns six objects - the eeg data, the sample numbers at which each trial began, \n
    and what task was performed in each trial. It returns those three objects for session 1 followed by session 2.\n
    Note this function is run automatically by this script, and a dictionary of data, trial times, and trial types\n
    are created.'''
    #open file in scipy
    annots = loadmat(filename)
    #dig into data to get to data for each session
    [data] = annots['data']
    [sesh_1] = data[0]
    [sesh_2] = data[1]
    #Creating dataframes of each sessions data
    sesh_1 = pd.DataFrame(sesh_1)
    sesh_2 = pd.DataFrame(sesh_2)

    #save eeg recording data from session 1
    [a] = sesh_1['X'] 
    #save the sample numbers at which each trial began in session 1
    [b] = sesh_1['trial']
    #save the list of what task was performed in each trial in session 1
    [c] = sesh_1['y']

    #save the raw eeg recording data from session 2
    [d] = sesh_2['X']
    #save the sample numbers at which each trial began in session 2
    [e] = sesh_2['trial']
    #save the list of what task was performed in each trial in session 2
    [f] = sesh_2['y']
    
    #return all needed data
    return a, b, c, d, e, f

#Create dictionaries to populate with our data
data_dict = {}
trial_times_dict = {}
y_dict = {}

 #extract data for subject C
(data_dict['sub_C_sesh_1'], 
 trial_times_dict['sub_C_sesh_1'], 
 y_dict['sub_C_sesh_1'], 
 data_dict['sub_C_sesh_2'], 
 trial_times_dict['sub_C_sesh_2'], 
 y_dict['sub_C_sesh_2']) = data_ingester('data/C.mat')

#Create info file used to create MNE raw objects
#channels used from study documentation
ch_names = ['AFz', 'F7', 'F3', 'Fz', 'F4', 'F8', 'FC3', 'FCz', 'FC4', 'T3', 
            'C3', 'Cz', 'C4', 'T4', 'CP3', 'CPz', 'CP4', 'P7', 'P5', 'P3', 
            'P1', 'Pz', 'P2', 'P4', 'P6', 'P8', 'PO3', 'PO4', 'O1', 'O2']
#per the study documentation data was collected at 256 hertz
sfreq = 256
#all channel types are eeg sensors
info = mne.create_info(ch_names, sfreq, ch_types='eeg')

#Create event dictionary
events_explained = {'brainteaser/word': 1, 'brainteaser/subtraction': 2, 
               'brainteaser/navigation': 3, 'motor/hand': 4, 'motor/feet': 5}

#Create event dictionary for later use in creating epochs objects
#event array combines stimulus times (3 sec after start of trial),
#A middle column of zeros since our data is digital,
#And a final column to indicate which kind of stimulus occurred
event_dict = {}
for key, value in trial_times_dict.items():
    event_dict[key] = np.concatenate((np.reshape(value.flatten() + (256 * 3), 
                                                 (len(value), 1)),
                                      np.zeros((y_dict[key].shape[0], 1)),
                                      y_dict[key]), axis=1).astype(int)

Now the second piece of code to generate raw objects and filter them:

#Load raw dict with mne raw objects
raw_dict = {}
for key, value in data_dict.items():
    raw_dict[key] = mne.io.RawArray(value.T, info, verbose=0)

#Filter raw dict
for key, value in raw_dict.items():
    value.filter(l_freq=None, 
                 h_freq=40, 
                 method='fir', phase='zero', verbose=0)

3rd piece of code to create epoch objects from the raw, and drop those that are too flat:

#Compute percentage of trials dropped for each setting
epoch_dict = {}
for key, value in raw_dict.items():
    epoch_dict[key] = mne.Epochs(value, events=event_dict[key], 
                                     event_id=events_explained, 
                                     tmin=-3, tmax=4.5, 
                                     baseline=None,
                                     preload=True,
                                     picks=[ch for ch in ch_names if 
                                            ch not in ['AFz', 'F7', 'F8']],
                                     reject=None,
                                     flat={'eeg':25},
                                     reject_tmin=1,
                                     reject_tmax=4.5,
                                     verbose=0)
    dropped = y_dict[key].shape[0] - epoch_dict[key].get_data().shape[0]
    drop_percentage = dropped / y_dict[key].shape[0]
    print(f'{drop_percentage} percent of trials are being dropped from {key}')

Thank you!

Why donā€™t you build an even smaller example that doesnā€™t work with this huge number of different rejection thresholds, but simply uses one threshold, and performs the rejection procedure multiple times? Will this not demonstrate the issue?

Ok, sure thing. I updated the code in the post above to make things smaller - and in so doing, I think got us closer to figuring out the root cause! Now broke the code into three sections to illustrate my findings.

From what I can tell, the different result is occurring from the filtering process rather than the rejection process.

Running only the 3rd block of code (just the epoch creation step) does not seem to result in different number of epochs being dropped. However, if you re-run the 2nd and 3rd steps (first filtering on the raw objects, then creating epochs and dropping based on flat) then you get different results quite frequently. Here are my results from running 5 tests, running both blocks 2 and then 3. Running just block 3 forty times in a row resulted in the same outcome all forty times.

So the question becomes, should filtering result in slightly different data each time its run, even when using the same settings?

0.16 percent of trials are being dropped from sub_C_sesh_1
0.025 percent of trials are being dropped from sub_C_sesh_2

0.165 percent of trials are being dropped from sub_C_sesh_1
0.025 percent of trials are being dropped from sub_C_sesh_2

0.17 percent of trials are being dropped from sub_C_sesh_1
0.025 percent of trials are being dropped from sub_C_sesh_2

0.17 percent of trials are being dropped from sub_C_sesh_1
0.025 percent of trials are being dropped from sub_C_sesh_2

0.17 percent of trials are being dropped from sub_C_sesh_1
0.03 percent of trials are being dropped from sub_C_sesh_2

Frequency filtering should be a deterministic operation, so my answer is ā€œnoā€. Instead of looking at dropped epochs, we should compare the data produced by filtering, i.e. do something like:

raw_filt_1 = raw_orig.filter(...)
raw_filt_2 = raw_orig.filter(...)

assert np.allclose(raw_filt_1.get_data(), raw_filt_2.get_data())

The download link unfortunately doesnā€™t work (anymore?)

RE the download link - very strange, it seems to be working for me. That said, since weā€™re now just looking at a single subject, I realize the direct link from BNCI is also simple, you can also download the data in question here: http://bnci-horizon-2020.eu/database/data-sets/004-2015/C.mat

Then, per your suggestion, I dug into the filtering a little more, and it does appear that each time the filters are running it is resulting in slightly different data.

I modified your suggested test slightly - if run exactly as proposed the results are the same each time because the filtering modifies the raw object in place (so the subsequent filters donā€™t do anything). If we create a fresh raw object, filter, and get the resulting data, then we can see the data shifting each time the filter is run.

#Create a loop to iterate through filtering and epoch creation 10 times

filtered_data_dict = {}
dropped_trials_count_dict = {}

for n in range(10):
    #create key name for our dictionaries for this iteration
    key = f'iter_{n}'
    
    #Create raw object, filter data, and save resulting data
    raw_sesh_1 = mne.io.RawArray(data_dict['sub_C_sesh_1'].T, info, verbose=0)
    raw_sesh_1.filter(l_freq=None, 
                      h_freq=40, 
                      method='fir', phase='zero', verbose=0)
    filtered_data_dict[key] = raw_sesh_1.get_data()

    #Create epoch with a flat rejection setting and save number of epochs dropped
    epoch_obj = mne.Epochs(raw_sesh_1, events=event_dict['sub_C_sesh_1'], 
                           event_id=events_explained, 
                           tmin=-3, tmax=4.5, 
                           baseline=None, 
                           preload=True,
                           picks=[ch for ch in ch_names if 
                                  ch not in ['AFz', 'F7', 'F8']],
                           reject=None,
                           flat={'eeg':25},
                           reject_tmin=1,
                           reject_tmax=4.5,
                           verbose=0)
    dropped = y_dict['sub_C_sesh_1'].shape[0] - epoch_obj.get_data().shape[0]
    dropped_trials_count_dict[key] = dropped
    
#Compare some of our results
print(list(dropped_trials_count_dict.values()))
print(np.all(filtered_data_dict['iter_0'] == filtered_data_dict['iter_1']))
print(np.allclose(filtered_data_dict['iter_0'], filtered_data_dict['iter_1']))

Which results in shifting epochs dropped, driven by filtered data that isnā€™t identical:

image

Let me know if you can replicate! If so, I think Iā€™m going to start digging into the filter implementation in the code base a little bit, will be fun to see if I can figure out any potential causes.

I am sorry, I donā€™t have much time to spend on this, and I couldnā€™t find a single code snippet that I could just copy & paste into my editor and immediately run to import the raw data. Could you please provide such a minimal example, just for loading subject Cā€™s raw data? Thatā€™s all Iā€™d need for now. I have downloaded C.mat from the location you provided.

Thanks!

try passing data_dict[ā€˜sub_C_sesh_1ā€™].T.copy() to RawArray

otherwise it will filter the data in data_dict[ā€˜sub_C_sesh_1ā€™] many times.

Alex