some problems with sliding time window when decoding

  • MNE version: e.g. 0.24.0
  • operating system: Linux

Hey! guys,

I’m trying to do the decode with the sliding time window again, but I met some problems that the decoding performance is the same across all time windows, I really cannot figure out why this is the case. Here are my code snippets:

epochs_data = epoch.get_data() # shape: (n_trials, n_channels, n_timepoints), i.e., (334, 306, 700) 

sfreq = epoch.info['sfreq']
# sampling rate: 200
# original sampling rate: 1000.

w_length = int(sfreq * 0.1)
# running classifier: window length, 20 time ponits --> 100ms

w_step = int(sfreq * 0.1)
# running classifier: window step size, 20 time ponits

w_start = np.arange(0, epochs_data.shape[2] - w_length, w_step)

labels_fea_col = epoch.metadata['feature_color'].to_numpy(copy=True)

cv = ShuffleSplit(10, test_size=0.1, random_state=6)
cv_split = cv.split(epochs_data)

n_jobs = 20

vector = Vectorizer()
logisreg = LogisticRegression(random_state=6, n_jobs=n_jobs, max_iter=500)

for train_idx, test_idx in cv_split: # the indices of trials as training or test set
    y_train = labels_fea_col[train_idx]
    y_test = labels_fea_col[test_idx]

    # running classifier: fit and test classifier on sliding window
    score_whole_window = []
    for n in w_start:
        print(n)
        globals()[f'logireg_{n}'] = LogisticRegression(random_state=6, n_jobs=n_jobs, max_iter=500)
        
        X_train = epochs_data[train_idx][:,:,n:(n + w_length)]
        X_train_vec = vector.fit_transform(X_train)

        globals()[f'logireg_{n}'].fit(X_train_vec, y_train)

        X_test = epochs_data[test_idx][:, :, n:(n + w_length)]
        X_test_vec = vector.fit_transform(X_test)

        score_this_window = globals()[f'logireg_{n}'].score(X_test_vec, y_test)

        score_whole_window.append(score_this_window)
        # append scores of each time window
        # len(score_whole_window) = w_start.shape

the problem is the score_this_window is always the exact same value for each time window, but it’s not reasonable. I event created a new logistic regression for each time window to prevent all time windows from using the same logistic regressor, but it still didn’t work.

Do you guys have any ideas on how to modify this to correct it?
Thank you

Best

you should not call vector.fit_transform(X_test) but vector.transform(X_test)

also you should scale the features before the LogisticRegression

see https://mne.tools/dev/generated/mne.decoding.Scaler.html

HTH
ALex

1 Like

Hi! Alex,
Thank you so much! :sparkling_heart: :sparkling_heart: :sparkling_heart:
It’s amazing! :star_struck: I modified my code according to your suggestions and it worked! :+1: :+1: :+1:
So why vector.transform(X_test) make such a huge difference compared with vector.fit_transform(X_test)? :face_with_monocle:

great

see https://scikit-learn.org/stable/common_pitfalls.html#data-leakage-during-pre-processing

maybe it can help to understand

Alex

1 Like