GeneralizingEstimator with incremental learning / .partial_fit

External Email - Use Caution

Hi!

I would need to try decoding with incremental learning (EEG data).
I was planning to use logistic regression by means of the SGDClassifier
<https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html>
.
I would then need to call .partial_fit to make my estimator learn on each
of my training sets.
However:

'GeneralizingEstimator' object has no attribute 'partial_fit'

Same issue for SlidingEstimator.
Is there a way to work around this limitation?

Thank you so so much in advance!

Giulia Gennari
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/attachments/20200805/2ceed796/attachment.html

External Email - Use Caution

Hi Giulia,

I think you should be able to change the method:

model = sklearn.linear_model.SGDClassifier()
model.fit = model.partial_fit
slider = mne.decoding.SlidingEstimator(model)
for X, y in train_batches:
    slider.fit(X, y)

Best

JR

External Email - Use Caution

Dear Jean-R?mi,

Thank you for the nice suggestion!

Just to make sure that this is working (I apologize for my ignorance):

When I run:
model = SGDClassifier(loss='log', class_weight='balanced')
model.fit = model.partial_fit
slider1 = SlidingEstimator(model, scoring='roc_auc')
slider1.fit(X_train, y_train)

or

clf = make_pipeline(Vectorizer(), StandardScaler(), model)
slider2 = SlidingEstimator(clf, scoring='roc_auc')
slider2.fit(X_train, y_train)

I do not get any error, while I would expect:

ValueError: class_weight 'balanced' is not supported for partial_fit.
In order to use 'balanced' weights, use
compute_class_weight('balanced', classes, y). Pass the resulting
weights as the class_weight parameter.

Since this is what I get with:
model.fit(X_train[:,:,single_time_point], y_train)

Is there a good reason for that? E.g. class weights are computed internally
beforehand by SlidingEstimator?

Thank you again!

Giulia

External Email - Use Caution

Hi Giula,

good catch, I had forgotten that we're cloning the estimator for each time
sample; you'll thus need to do this:

class MyModel(SGDClassifier):
    def fit(self, X, y):
        super().partial_fit(X, y)
        return self

model = MyModel(loss='log', class_weight='balanced')
slider = SlidingEstimator(model, scoring='roc_auc')

Hope that helps

JR

External Email - Use Caution

Dear Jean-R?mi and dear Alex,

*Thank you!*

A solution based on this:
class MyModel(SGDClassifier):
    def fit(self, X, y):
        super().partial_fit(X, y)
        return self

..works fine!
Except for the crucial fact that parallel processing (n_jobs>1) seems not
feasible.
This is what I get when I try to score the slider (apologies for the
ugliness, I copy-paste everything since it might be meaningful to catch
what is wrong):

External Email - Use Caution

please create an issue on github and include a script to replicate the
crash by just
copy pasting. I'll then have a look.

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/attachments/20200807/5c00fcb6/attachment.html

External Email - Use Caution

Hi Giula,

In the long run, for batch optimization of parallel tasks (here each slided
time sample), I would encourage you to have a look at pytorch; sklearn is
not really optimal for this because it can't make use of gpu.

In the meantime, here is a solution to your problem: simply put your new
class in a separate script e.g.

# in mymodel.py
import numpy as np
from sklearn.linear_model import SGDClassifier

class MyModel(SGDClassifier):
    def fit(self, X, y):
        if not hasattr(self, 'classes_'):
            self.classes_ = np.unique(y)
        super().partial_fit(X, y, self.classes_)
        return self

# main script
import numpy as np
from mne.decoding import SlidingEstimator
from mymodel import MyModel
model = MyModel()
slider = SlidingEstimator(model, scoring='roc_auc', n_jobs=2)

X = np.random.randn(100, 10, 3)
y = np.random.randint(0, 2, 100)
slider.fit(X, y)
slider.score(X, y)

hope that helps

JR

External Email - Use Caution

Dear Jean-R?mi,

Thank you for the suggestion and, above all, thank you so much for your
help and assistance!
My scripts have been working just fine while I would have never been able
to implement my current analysis without your prompts.

All the best,

Giulia