I’m running some decoding models which are taking about 2 hours per participant. Naturally, I would like this to be fast. I’ve seen mixed reports of whether you can run SKlearn on a GPU, and in turn this wold affect whether or not you can run the SlidingEstimator
on a GPU… Do you know if it is possible and if so can you recommend a platform? Thanks!
are you already using n_jobs = -1?
what is your classifier?
Alex
I’m using n_jobs = -1 yes. I’m using Ridge, this is my approach:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from mne.decoding import SlidingEstimator
from sklearn.linear_model import RidgeClassifier
from sklearn.model_selection import GridSearchCV, cross_val_predict, RepeatedStratifiedKFold
rlr = RidgeClassifier(max_iter=1000)
grid = {'alpha': [1e-3, 1e-2, 0.01, 0.1, 1, 5]}
cv = RepeatedStratifiedKFold(n_splits=3, n_repeats=2, random_state=1)
rlr = GridSearchCV(rlr, grid, scoring='roc_auc', cv=cv, n_jobs=-1)
clf = make_pipeline(StandardScaler(), rlr)
time_decode = SlidingEstimator(clf, n_jobs=1, scoring='roc_auc', verbose=True)
predictions = cross_val_predict(time_decode, X, y, cv=4, method='decision_function')
you should use https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeClassifierCV.html
and not a grid search for Ridge
Alex
2 Likes
Thanks! That appears to have speeded things up a bit! I also found that wrapping the model in joblib.parallel_backend
also improved the speed, like this:
def ridgeReturnProbabilitiesCV(X, y):
cv = RepeatedStratifiedKFold(n_splits=3, n_repeats=2, random_state=1)
rlr = RidgeClassifierCV(alphas=[1e-3, 1e-2, 0.01, 0.1, 1, 5],
scoring='roc_auc',
cv=cv)
clf = make_pipeline(StandardScaler(), rlr)
time_decode = SlidingEstimator(clf, n_jobs=1, scoring='roc_auc', verbose=True)
probabilities = cross_val_predict(time_decode, X, y, cv=4, method='decision_function')
return probabilities
import joblib
tic = time.time()
with joblib.parallel_backend(backend='loky', n_jobs=8):
probabilities = ridgeReturnProbabilitiesCV(X, y)
toc = time.time() - tic
print(participant, 'took', toc / 60, 'minutes.')
I believe you could achieve a similar result, yet simplify your code, by passing n_jobs
to cross_val_predict()
.
Best wishes,
Richard
1 Like
nice one, thanks!