to yield a time resolved classification accuracy. I'm new to decoding so I've done it in a fairly brute way (just iterating this script over every time point), which yields a fairly convincing classification accuracy timecourse. However, I'm a bit concerned at how high the accuracy is during the baseline, pre-stim period. See attached for the modified script using the sample data and an example of the output. I'm new to decoding, but the best answer I've been able to find for abnormally high pre-stim accuracy is failing to cross validate, but that shouldn't be the case as cross validation is being performed (but perhaps I'm doing it wrong) . Is there something improper about my strategy here? Thanks for any input.
Do you have the same number of trials in each condition after any trial rejection you do? If not, then the issue might be that 50% is not the correct chance level to think about, rather the correct chance level is the proportion of trials that is in your more frequent condition (eyeballing, maybe like 55%?). There are unbiased classifiers you can use, but I am not sure if they are built into MNE python...
Overall, your baseline doesn't look too bad - you would need to do a
statistical test to check whether it is just noise variation or
above-chance decoding scores.
Still there could be multiple reasons behind a significant accuracy before
t0 here:
- accuracy is biased for imbalanced datasets. You can either
use epochs.equalize_event_counts before your cross validation, or better,
use a 'roc_auc' scoring metrics
- filtering the data can spread information over time. Try changing your
filtering parameters
- IIRC, the 'sample' protocol is actually not randomized, and it is
possible to predict the simulus category in advance.
Depending on your number of trials, the number of feature and the cross
validation procedure, you can have fairly high decoding results just by
chance.
You should never interpret a results without running a statistical test.
One good way to get the chance level of your classification pipeline is to
run a permutation test : http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.permutation_test_score.html
the idea is to shuffle the labels, and train again the model to see what
score you get 'by chance'. it is sometimes surprising how high you can get.
If you have an unbalanced number of trial per class, i would also suggest
to use the AUC as a metric instead of the accuracy.
Thanks for all the helpful suggestions. Everyone brought up imbalanced datasets as a possible source of the problem, but trial counts are equalized both in the example and in my personal dataset where the problem is a bit worse (baseline accuracies are averaging around 65%, even with the code changes JR suggested and with more samples and less features). I also know I shouldn't jump to any conclusions without doing actual stats, and indeed I don't really expect these baseline periods to show up as significantly above chance. However, I figured a reviewer would nail me if I tried to report classification timecourses with that high of a baseline accuracy, even if it was statistically meaningless.
JR, thanks for those bits of code, that definitely cleans it up a lot. At Alex and your's suggestions, I'm trying to use the 'roc_auc' scoring method to see if it calms the baseline down a bit, but I'm getting some inconsistent behavior out of cross_val_multiscore when trying to use that metric. Attached is the same source space decoding tutorial from my original message but now modified to run using these functions from the master branch that JR suggested. The plot sensors decoding tutorial you linked runs just fine for me, but when I try to run this on the modified source space tutorial (attached), I get the following error:
Value Error: roc_auc scoring can only be computed for two-class problems
It doesn't seem to like the data tag variable y. Strangely enough, if I define y as such:
y=epochs.events[:,2], as it is defined in the sensor space tutorial
the cross_val_multiscore function does not return the error (the scores are obviously bad since the labeling is wrong though). In both cases y is just a simple numpy array with identical shape (112,) and the same number of unique digits, just in different orders. So, I'm not really sure what's happening there, but hopefully others can replicate the problem.
That was what I was thinking at first, but I actually initially first got the error using y values in [0,1]. Plus, the sensor tutorial ( https://martinos.org/mne/dev/auto_tutorials/plot_sensors_decoding.html#temporal-decoding), which runs fine has the y array filled with 1's and 3's. Also, the updated tutorial you linked (which thanks for doing), is also defining y based on
y=epochs.events[:,2] before using the cross_val_multiscore function with 'roc_auc' scoring, so those values aren't in range [0,1].
Even if I change the definition of y in the example with the sample data I attached to my last message to:
y = np.repeat([0,1], len(X) / 2) # belongs to the second class
I still get the error. But that's all presuming others are able to replicate the error and its not just my system being weird.