Using a callable() scoring method in mne.decoding?

Hello all,

I am writing to ask if someone has an example on how to set the “predict” method when indicating a callable() scoring function in mne.decoding?

I am considering using a F1 measure sklearn.metrics.f1_score when decoding between two classes using the mne function mne.decoding.GeneralizingEstimator

The mne.decoding,GeneralizingEstimator API mentions:
" scoring callable() | str | None
Score function (or loss function) with signature score_func(y, y_pred, **kwargs).
Note that the “predict” method is automatically identified if scoring is a string (e.g. scoring='roc_auc' calls predict_proba), but is not automatically set if scoring is a callable (e.g. scoring=sklearn.metrics.roc_auc_score)."

Is there an example for how to correctly use callable() scoring methods in mne?

Many thanks!

Ana P

Hello, there is actually an example hidden in the documentation you posted:

Best wishes,
Richard

Thanks @richard ! I get an error when applying the example:
scoring=sklearn.metrics.f1_score

ValueError: scoring value <function f1_score at 0x7fa09a8e1550> looks like it is a metric function rather than a scorer. A scorer should require an estimator as its first parameter. Please use `make_scorer` to convert a metric to a scorer.

But it is corrected if using:
scoring=sklearn.metrics.make_scorer(sklearn.metrics.f1_score)

You are right, the documentation is not good here.

To use any other scorer that ships with scikit-learn, you can simply pass its name as a string, e.g. in your case:

scoring='f1'

To get a list of all scorers that can be used that way, use the sklearn.metrics.get_scorer_names() function, which will produce output like this:

['accuracy',
 'adjusted_mutual_info_score',
 'adjusted_rand_score',
 'average_precision',
 'balanced_accuracy',
 'completeness_score',
 'explained_variance',
 'f1',
 'f1_macro',
 'f1_micro',
 'f1_samples',
 'f1_weighted',
 'fowlkes_mallows_score',
 'homogeneity_score',
 'jaccard',
 'jaccard_macro',
 'jaccard_micro',
 'jaccard_samples',
 'jaccard_weighted',
 'matthews_corrcoef',
 'max_error',
 'mutual_info_score',
 'neg_brier_score',
 'neg_log_loss',
 'neg_mean_absolute_error',
 'neg_mean_absolute_percentage_error',
 'neg_mean_gamma_deviance',
 'neg_mean_poisson_deviance',
 'neg_mean_squared_error',
 'neg_mean_squared_log_error',
 'neg_median_absolute_error',
 'neg_root_mean_squared_error',
 'normalized_mutual_info_score',
 'precision',
 'precision_macro',
 'precision_micro',
 'precision_samples',
 'precision_weighted',
 'r2',
 'rand_score',
 'recall',
 'recall_macro',
 'recall_micro',
 'recall_samples',
 'recall_weighted',
 'roc_auc',
 'roc_auc_ovo',
 'roc_auc_ovo_weighted',
 'roc_auc_ovr',
 'roc_auc_ovr_weighted',
 'top_k_accuracy',
 'v_measure_score']

When you pass these strings to sklearn.metrics.check_scoring(), you can see that internally, make_scorer() will be called:

# %%
from sklearn.linear_model import LogisticRegression
import sklearn.metrics as skm

estimator = LogisticRegression(solver='liblinear')
skm.check_scoring(
    estimator,
    'f1'
)

returns:

make_scorer(f1_score, average=binary)

Which, I believe, should always be the preferred method of doing things over manually passing a function to make_scorer(), as it ensures that reasonable defaults are being set.

I hope this clarifies things a bit.

It would be great if you’d be willing to improve our documentation. Let us know if you’re interested, and we can guide you.

Best wishes,
Richard

1 Like