When to perform split of data into training and test sets (classifier is returning below chance accuracies)

Hello to everyone,

  • MNE-Python version: 0.23.0
  • MNE_bids version: 0.8
  • operating system: Windows 10
  • IDE: Spyder

I am working with the MEG dataset provided by Rathee et al. (2021). In short, it contains roughly 60 minutes of MEG-recordings (among other channels suchs as EOG and ECG) for 17 subjects who were asked to perform MI (hands / feets) and CI (word generation / subtraction) tasks.
Data: Link
Publication describing the data: Link

My goal is to train a binary classifier for each subject and each combination of stimuli which can predict the stimulus’ category (e.g., in epoch X subject 1 was imaging moving their hand), just like the authors did in their publication describing the data (see link above).

Currently, my classifier is returning very low below chance accuracies (e.g., 0.2 when chance level is 0.5). My guess is that I am performing my split of the data into train and test data at the wrong point during my analysis. Before, I only split it right before training the SVC model, which resulted in very high accuracies (up to 100%). However, it doesn’t feel right to preprocess the test data together with the training data since in an online BCI setting, this wouldn’t be happening either.
I would greatly appreciate any advice/thoughts/input on this. Thanks!

This is my current workflow for training a classifier for a single subject and a single condition:

  1. Read in raw data for subject X
  2. For each frequency band combination:
    2.1 Filter the raw data in the given frequency range
    2.2 Epoch the raw data
    2.3 Subselect the epochs of interest (i.e., epochs where either event 1 or event 2 was the stimuli presented) + subselect time frame 0.5-3.5 seconds fromt the epochs
    2.4 Split the filtered and epoched data into training and testing data sets
    2.5 For training and test data, perform individually:
  • Scale data
  • Perform PCA
  • Perform CSP
  • Add CSP features to collection of all extracted CSP features from different frequency bands
  1. Fit SVC with training data (i.e., the FBCSP features)
  2. Predict ylabels for test data with SVC
  3. Compare for accuracy

The actual code:

from sklearn.model_selection import train_test_split
from mne.decoding import Scaler, CSP, UnsupervisedSpatialFilter
from sklearn.svm import SVC
from sklearn.decomposition import PCA
import numpy as np
import mne_helperfunctions as hf
from itertools import combinations
from sklearn.metrics import accuracy_score

# read in data
subject = "20"
session = "1"
ChanSel = "grad"
raw = hf.read_in_data(subject, session).pick_types(ChanSel).drop_channels(['MEG1733', 'MEG2333']).load_data()

event_names = ["Both Hand Imagery",
       "Both Feet Imagery",
       "Word Generation Imagery",
       "Subtraction Imagery"]
nComb = list(combinations((event_names), 2))
event = nComb[0]

lowpass_freq = [8, 14]
highpass_freq = [12, 30]

features_train = []
features_test = []

# extract spatio-temporal features for each frequency band
for iBand in range(len(highpass_freq)):

    # filter data
    lfreq = lowpass_freq[iBand]
    hfreq = highpass_freq[iBand]
    raw_filt = raw.copy().filter(lfreq, hfreq, method="iir")

    # epoch data
    data_events, data_event_id = hf.return_fixed_events(raw_filt)
    epochs = hf.epoch_data(raw_filt, data_events, data_event_id)

    # subselect data
    epochs = epochs[event]
    epochs = epochs.crop(tmin=0.5, tmax=3.5)

    # split data
    X_train, X_test, y_train, y_test = train_test_split(epochs.get_data(), epochs.events[:, 2], test_size=0.2, random_state=43)
    # scale data
    scaler = Scaler(epochs.info)
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.fit_transform(X_test)
    # PCA
    pca = UnsupervisedSpatialFilter(PCA(.99))
    X_train_pca = pca.fit_transform(X_train_scaled)
    X_test_pca = pca.fit_transform(X_test_scaled)

    # extract csp features
    csp = CSP(n_components=6, reg=0.1, log=True)
    X_train_csp = csp.fit_transform(X_train_pca, y_train)
    X_test_csp = csp.fit_transform(X_test_pca, y_test)


# concatenate features from different frequency bands
X_train_feat = np.concatenate(features_train, axis=1)
X_test_feat = np.concatenate(features_test, axis=1)

svc = SVC(kernel="rbf")
svc.fit(X_train_feat, y_train)
y_pred = svc.predict(X_test_feat)
print("Accuracy:", accuracy_score(y_test, y_pred))

You are right that it is not adequate to process the training and test data together but it’s also a problem to use different processing parameters (e.g. scaling mean and sd, pca solution, csp filters) for each of the splits because you can get very different parameters in the test data which will interfere with classification. You should learn the parameters on the training split and apply them to the test data.

You can do this easily by wrapping all your steps in a sklearn pipeline.

something like:

from sklearn.pipeline import Pipeline

clf = Pipeline([('scaler',  Scaler(epochs.info)), ('pca', UnsupervisedSpatialFilter(PCA(.99))), ('csp', CSP(n_components=6, reg=0.1, log=True)])

X_train = clf.fit_transform(X_train)
X_test =  clf.transform(X_test)