Training and test classifies using different datasets

  • MNE version: 0.24.0
  • operating system: Linux

Hey! guys,
Recently I’m using the temporal generalization method to perform some decoding work. Specifically, I need to train classifiers for each time point using data in stage A, then test the classifiers in stage B. Here are some code snippets:

n_cvsplits = 8
cv = StratifiedKFold(n_splits=n_cvsplits, shuffle=True, random_state=42)
n_jobs = -2

clf = make_pipeline(
    StandardScaler(), # Z-score data, because gradiometers and magnetometers have different scales
    LogisticRegression(random_state=42, n_jobs=n_jobs, max_iter=1000))

globals()[f'slid_task_{t}_{r}_fea_{f}_{o}'] = GeneralizingEstimator(clf, n_jobs=n_jobs, scoring='roc_auc', verbose=True)

globals()[f'slid_task_{t}_{r}_fea_{f}_{o}'].fit(globals()[f'arr_epo_data_reorder_avg_windows_trans_{r}_{o}'], globals()[f'labels_learn_fea_{f}_obj_{o}'])

globals()[f'dvalue_per_task_{t}_{r}_fea_{f}_{o}'] = globals()[f'slid_task_{t}_{r}_fea_{f}_{o}'].decision_function(globals()[f'arr_epo_data_reorder_avg_windows_trans_per_{r}_{o}'])

and the problem I met was the processes of transforming estimators seemed not finished, but no error messages were prompted and the result’s shape was correct. Here is what the estimating process look like:

Is there anything wrong with it? if there is, what can I do to fix this? :face_with_peeking_eye:
Thank you very much! :sparkling_heart: :sparkling_heart: :sparkling_heart:
Best

Hi, @YuZhou,
do you mean that the progressbar does not seem to complete but the outputs look correct?
Could you create a simple reproducible example for the problem you experience?

BTW, I think its better to just use dictionaries than assign to globals(), your code will be easier to read (so for example slid_task[f'{t}_{r}_fea_{f}_{o}'] = ...). :slight_smile:

Hi! Mikolaj,
Thanks a lot for your reply and suggestions! Your explanation of the problem is exactly what I meant. I’m not sure how can I upload the data example to you guys.
After I posted this problem and before you replied to me, I kind of find an alternative solution to get the roc_auc. Instead of calculating the roc_auc from the values obtained from the decision_fucntion(), I defined the score=‘roc_auc’ in the estimator, then use the .score method to get the rouc_auc values.

clf = make_pipeline(
            StandardScaler(), # Z-score data, because gradiometers and magnetometers have different scales
            LogisticRegression(random_state=42, n_jobs=n_jobs, max_iter=1000))
slid = GeneralizingEstimator(clf, n_jobs=n_jobs, scoring='roc_auc', verbose=True)
slid.fit(training_data, training_labels)
score = slid.score(test_data, test_label)

and as to the dictionaries suggestions, I will start to use them instead of globals(), thank you so much! :grinning: :grinning: :grinning: :heart: :heart: :heart:

Best
Yu