Apologies for the long delay in responding to this thread - I had a very time consuming project going, but now I have time to properly circle back here.
Iām using the dataset from this study to try and replicate and improve on their classification results: Individually Adapted Imagery Improves Brain-Computer Interface Performance in End-Users with Disability - PMC
The full dataset of EEG recordings collected in the study can be found on this page, listed under ā13. Individual imagery (004-2015āā: Data sets - BNCI Horizon 2020
I set up a gridsearch to find the percentage of trials that are dropped with different flat and reject settings - the full code is below. First, Iāll demonstrate the issue.
When running the gridsearch, I noticed that the results change on some iterations when nothing is changing in the code (i.e., a different number of trials are being dropped when the same code is run). For example, here are two screenshots - in both the āflatā rejection setting used on each row is shown in the middle column, along with the percentage of the trials that are being dropped as a result in the right column:
Here is the code that performs the gridsearch:
#Create list of epoch drop filters
reject_options_1 = [None]
reject_options_2 = [None,
{'eeg': 40}, {'eeg': 50}, {'eeg': 60},
{'eeg': 70}, {'eeg': 80}, {'eeg': 90},
{'eeg': 105}, {'eeg': 140}, {'eeg': 175},
{'eeg': 135}, {'eeg': 145}, {'eeg': 150},
{'eeg': 155}, {'eeg': 160}, {'eeg': 167},
{'eeg': 146}, {'eeg': 147},
{'eeg': 130}, {'eeg': 125}, {'eeg': 120},
{'eeg': 115}, {'eeg': 110},
{'eeg': 180}, {'eeg': 185}, {'eeg': 190},
{'eeg': 143}, {'eeg': 110}, {'eeg': 115},
{'eeg': 120}, {'eeg': 125}, {'eeg': 130},
{'eeg': 83}, {'eeg': 86}, {'eeg': 78},
{'eeg': 100}, {'eeg': 95},
{'eeg': 210}, {'eeg': 183}, {'eeg': 200},
{'eeg': 205}]
flat_options_1 = [None]
flat_options_2 = [None,
{'eeg': 3}, {'eeg': 6}, {'eeg': 9},
{'eeg': 11}, {'eeg': 13}, {'eeg': 15},
{'eeg': 17}, {'eeg': 19}, {'eeg': 21},
{'eeg': 25}, {'eeg': 30}, {'eeg': 35},
{'eeg': 36}, {'eeg': 31}, {'eeg': 33},
{'eeg': 34}, {'eeg': 30.5}, {'eeg': 31.5},
{'eeg': 16}, {'eeg': 18}, {'eeg': 20},
{'eeg': 17.3}, {'eeg': 17.6}, {'eeg': 18.5},
{'eeg': 18.75}, {'eeg': 19.5}, {'eeg': 20.5},
{'eeg': 22}, {'eeg': 23}, {'eeg': 24},
{'eeg': 19.75},
{'eeg': 24.5}, {'eeg': 26}, {'eeg': 27},
{'eeg': 28}, {'eeg': 29},
{'eeg': 25.5}, {'eeg': 29.5}, {'eeg': 26.5},
{'eeg': 37}, {'eeg': 38}, {'eeg': 39},
{'eeg': 40}, {'eeg': 41}, {'eeg': 42}]
#Create dataframe - tests flat & reject filters independently
rejection_settings_df = pd.concat((pd.DataFrame(itertools.product(reject_options_1,
flat_options_2),
columns=['reject', 'flat']),
pd.DataFrame(itertools.product(reject_options_2,
flat_options_1),
columns=['reject', 'flat'])),
ignore_index=True)
#reindex
rejection_settings_df = rejection_settings_df.reindex(columns=['reject', 'flat'] +
list(y_dict.keys()))
#Load raw dict with mne raw objects
raw_dict = {}
for key, value in data_dict.items():
raw_dict[key] = mne.io.RawArray(value.T, info, verbose=0)
#Filter raw dict
for key, value in raw_dict.items():
value.filter(l_freq=None,
h_freq=40,
method='fir', phase='zero', verbose=0)
#Compute percentage of trials dropped for each setting
for row in range(rejection_settings_df.shape[0]):
epoch_dict = {}
for key, value in raw_dict.items():
epoch_dict[key] = mne.Epochs(value, events=event_dict[key],
event_id=events_explained,
tmin=-3, tmax=4.5,
baseline=None,
preload=True,
picks=[ch for ch in ch_names if
ch not in ['AFz', 'F7', 'F8']],
reject=rejection_settings_df.reject[row],
flat=rejection_settings_df.flat[row],
reject_tmin=1,
reject_tmax=4.5,
verbose=0)
perc_trials_dropped_dict = {}
for key, value in y_dict.items():
dropped = value.shape[0] - epoch_dict[key].get_data().shape[0]
drop_percentage = dropped / value.shape[0]
rejection_settings_df.at[row, key] = drop_percentage
And then the code to show the results shown in the screenshots:
#view results, change subject to dig into data
subject = 'sub_C'
#Set exclude to ignore one of two filter types: 'reject' or 'flat'
exclude = 'reject'
(rejection_settings_df[['reject', 'flat',
f'{subject}_sesh_1']].
loc[(rejection_settings_df[f'{subject}_sesh_1'] < 0.2) &
(rejection_settings_df[f'{subject}_sesh_1'] > 0) &
(rejection_settings_df[exclude].isna())].sort_values([f'{subject}_sesh_1']))
Really appreciate any insight, and happy to help diagnose further, just let me know what would be helpful! Going to take a look at the codebase that implements the flat and reject features as well and see if I can come up with any hypotheses.