Here’s what my decoding approach looks like so far:
# shape of X and y:
X.shape
(4797, 64, 226)
y.shape
(4797,)
rlr = RidgeClassifier(max_iter=1500)
grid = {'alpha': [1e-3, 1e-2, 0.01, 0.1, 1, 5]}
cv = RepeatedKFold(n_splits=4, n_repeats=2, random_state=1)
rlr = GridSearchCV(rlr, grid, scoring='roc_auc', cv=cv, n_jobs=-1)
clf = make_pipeline(StandardScaler(), rlr)
time_decod = SlidingEstimator(clf, n_jobs=1, scoring='roc_auc', verbose=True).score(X, y)
scores = cross_val_multiscore(time_decod, X, y, cv=4, n_jobs=1)
In the above, the performance of the above modeling procedure can be seen in cross_val_multiscore
which has 4 fold cv, so iterating across each of the CV folds it builds a model on 75% of the data and evaluates it on the remaining 25%. And it returns the mean AUC for each fold and each time point. I would however like to get the predicted probabilities for each epoch at each time point, for each of the folds. So it should return an X.shape[0] by X.shape[2] array. Presumably something like this would be accomplished by ditching the 4 folds and using a leave one out approach, which should then return the probabilities for each epoch. However this would be computationally costly. I would like to just limit the model building/test to 4 times and not X.shape[0] times.
I could ditch the sliding estimator and, for each time point, diving the data into 4 groups, iterating through these and assigning the 75% to train and the 25& to test and running:
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
and then append the predictions for each test set. But this means I wouldn’t take advantage of the awesome SlidingEstimator and assuming it can be used to do this which would save a lot of code. Is there a way? Thanks!