Source Space Decoding Classification Timecourse

Hi,

I've been trying to modify the following example:

http://martinos.org/mne/dev/auto_examples/decoding/plot_decoding_spatio_temporal_source.html

to yield a time resolved classification accuracy. I'm new to decoding so I've done it in a fairly brute way (just iterating this script over every time point), which yields a fairly convincing classification accuracy timecourse. However, I'm a bit concerned at how high the accuracy is during the baseline, pre-stim period. See attached for the modified script using the sample data and an example of the output. I'm new to decoding, but the best answer I've been able to find for abnormally high pre-stim accuracy is failing to cross validate, but that shouldn't be the case as cross validation is being performed (but perhaps I'm doing it wrong) . Is there something improper about my strategy here? Thanks for any input.

Cheers,
Cody
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/attachments/20170804/0976d46b/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sample_classification_timecourse.png
Type: image/png
Size: 34122 bytes
Desc: sample_classification_timecourse.png
Url : http://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/attachments/20170804/0976d46b/attachment-0001.png
-------------- next part --------------
A non-text attachment was scrubbed...
Name: plot_decoding_spatio_temporal_source_cc.py
Type: text/x-python-script
Size: 6428 bytes
Desc: plot_decoding_spatio_temporal_source_cc.py
Url : http://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/attachments/20170804/0976d46b/attachment-0001.bin

Hi Cody,

Do you have the same number of trials in each condition after any trial rejection you do? If not, then the issue might be that 50% is not the correct chance level to think about, rather the correct chance level is the proportion of trials that is in your more frequent condition (eyeballing, maybe like 55%?). There are unbiased classifiers you can use, but I am not sure if they are built into MNE python...

Best wishes,
Avniel

Hi Cody,

Overall, your baseline doesn't look too bad - you would need to do a
statistical test to check whether it is just noise variation or
above-chance decoding scores.

Still there could be multiple reasons behind a significant accuracy before
t0 here:
- accuracy is biased for imbalanced datasets. You can either
use epochs.equalize_event_counts before your cross validation, or better,
use a 'roc_auc' scoring metrics
- filtering the data can spread information over time. Try changing your
filtering parameters
- IIRC, the 'sample' protocol is actually not randomized, and it is
possible to predict the simulus category in advance.

If you're using the MNE master branch, then I would recommend simplfy using
this instead of your big loop (see
https://martinos.org/mne/dev/auto_tutorials/plot_sensors_decoding.html#temporal-decoding
for more details):

clf = make_pipeline(StandardScaler(), SelectKBest(f_classif, k=500),
SVC(kernel='linear'))
time_decod = SlidingEstimator(clf, scoring='roc_auc')
scores = cross_val_multiscore(clf, X, y, cv=5)
plt.plot(times, scores.mean(0))

(Note that I would personnally recommend clf =
make_pipeline(StandardScaler(), LogisticRegression(C=1)) which should be
better)

Else, I believe we will be releasing the next version of MNE this month, so
you'll just have to update MNE.

Hope that helps,

Jean-R?mi

Hi Cody,

Depending on your number of trials, the number of feature and the cross
validation procedure, you can have fairly high decoding results just by
chance.
You should never interpret a results without running a statistical test.
One good way to get the chance level of your classification pipeline is to
run a permutation test :
http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.permutation_test_score.html
the idea is to shuffle the labels, and train again the model to see what
score you get 'by chance'. it is sometimes surprising how high you can get.

If you have an unbalanced number of trial per class, i would also suggest
to use the AUC as a metric instead of the accuracy.

Alex

Hi all,

Thanks for all the helpful suggestions. Everyone brought up imbalanced datasets as a possible source of the problem, but trial counts are equalized both in the example and in my personal dataset where the problem is a bit worse (baseline accuracies are averaging around 65%, even with the code changes JR suggested and with more samples and less features). I also know I shouldn't jump to any conclusions without doing actual stats, and indeed I don't really expect these baseline periods to show up as significantly above chance. However, I figured a reviewer would nail me if I tried to report classification timecourses with that high of a baseline accuracy, even if it was statistically meaningless.

JR, thanks for those bits of code, that definitely cleans it up a lot. At Alex and your's suggestions, I'm trying to use the 'roc_auc' scoring method to see if it calms the baseline down a bit, but I'm getting some inconsistent behavior out of cross_val_multiscore when trying to use that metric. Attached is the same source space decoding tutorial from my original message but now modified to run using these functions from the master branch that JR suggested. The plot sensors decoding tutorial you linked runs just fine for me, but when I try to run this on the modified source space tutorial (attached), I get the following error:

Value Error: roc_auc scoring can only be computed for two-class problems

It doesn't seem to like the data tag variable y. Strangely enough, if I define y as such:

y=epochs.events[:,2], as it is defined in the sensor space tutorial

the cross_val_multiscore function does not return the error (the scores are obviously bad since the labeling is wrong though). In both cases y is just a simple numpy array with identical shape (112,) and the same number of unique digits, just in different orders. So, I'm not really sure what's happening there, but hopefully others can replicate the problem.

Cheers,
Cody

Hi Cody,

Scikit-learn 'roc_auc' metric necessitates to have y values in [0, 1],
that's probably the issue:
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html

I updated and simplified the source decoding example: the PR is still under
review here:
https://github.com/mne-tools/mne-python/pull/4465

Comments and complaints are more than welcome!

HTH
JR

Hey JR,

That was what I was thinking at first, but I actually initially first got the error using y values in [0,1]. Plus, the sensor tutorial ( https://martinos.org/mne/dev/auto_tutorials/plot_sensors_decoding.html#temporal-decoding), which runs fine has the y array filled with 1's and 3's. Also, the updated tutorial you linked (which thanks for doing), is also defining y based on

y=epochs.events[:,2] before using the cross_val_multiscore function with 'roc_auc' scoring, so those values aren't in range [0,1].

Even if I change the definition of y in the example with the sample data I attached to my last message to:

y = np.repeat([0,1], len(X) / 2) # belongs to the second class

I still get the error. But that's all presuming others are able to replicate the error and its not just my system being weird.

Cheers,
Cody

Hi Cody,

Is your problem solved now? Else, can you open an issue on github for us to
replicate the error?

Thanks

JR