Hello to everyone,
- MNE-Python version: 0.23.0
- MNE_bids version: 0.8
- operating system: Windows 10
- IDE: Spyder
I am working with the MEG dataset provided by Rathee et al. (2021). In short, it contains roughly 60 minutes of MEG-recordings (among other channels suchs as EOG and ECG) for 17 subjects who were asked to perform MI (hands / feets) and CI (word generation / subtraction) tasks.
Publication describing the data: Link
My goal is to train a binary classifier for each subject and each combination of stimuli which can predict the stimulus’ category (e.g., in epoch X subject 1 was imaging moving their hand), just like the authors did in their publication describing the data (see link above).
Currently, my classifier is returning very low below chance accuracies (e.g., 0.2 when chance level is 0.5). My guess is that I am performing my split of the data into train and test data at the wrong point during my analysis. Before, I only split it right before training the SVC model, which resulted in very high accuracies (up to 100%). However, it doesn’t feel right to preprocess the test data together with the training data since in an online BCI setting, this wouldn’t be happening either.
I would greatly appreciate any advice/thoughts/input on this. Thanks!
This is my current workflow for training a classifier for a single subject and a single condition:
- Read in raw data for subject X
- For each frequency band combination:
2.1 Filter the raw data in the given frequency range
2.2 Epoch the raw data
2.3 Subselect the epochs of interest (i.e., epochs where either event 1 or event 2 was the stimuli presented) + subselect time frame 0.5-3.5 seconds fromt the epochs
2.4 Split the filtered and epoched data into training and testing data sets
2.5 For training and test data, perform individually:
- Scale data
- Perform PCA
- Perform CSP
- Add CSP features to collection of all extracted CSP features from different frequency bands
- Fit SVC with training data (i.e., the FBCSP features)
- Predict ylabels for test data with SVC
- Compare for accuracy
The actual code:
from sklearn.model_selection import train_test_split from mne.decoding import Scaler, CSP, UnsupervisedSpatialFilter from sklearn.svm import SVC from sklearn.decomposition import PCA import numpy as np import mne_helperfunctions as hf from itertools import combinations from sklearn.metrics import accuracy_score # read in data subject = "20" session = "1" ChanSel = "grad" raw = hf.read_in_data(subject, session).pick_types(ChanSel).drop_channels(['MEG1733', 'MEG2333']).load_data() event_names = ["Both Hand Imagery", "Both Feet Imagery", "Word Generation Imagery", "Subtraction Imagery"] nComb = list(combinations((event_names), 2)) event = nComb lowpass_freq = [8, 14] highpass_freq = [12, 30] # FBCSP features_train =  features_test =  # extract spatio-temporal features for each frequency band for iBand in range(len(highpass_freq)): # filter data lfreq = lowpass_freq[iBand] hfreq = highpass_freq[iBand] raw_filt = raw.copy().filter(lfreq, hfreq, method="iir") # epoch data data_events, data_event_id = hf.return_fixed_events(raw_filt) epochs = hf.epoch_data(raw_filt, data_events, data_event_id) # subselect data epochs = epochs[event] epochs = epochs.crop(tmin=0.5, tmax=3.5) # split data X_train, X_test, y_train, y_test = train_test_split(epochs.get_data(), epochs.events[:, 2], test_size=0.2, random_state=43) # scale data scaler = Scaler(epochs.info) X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.fit_transform(X_test) # PCA pca = UnsupervisedSpatialFilter(PCA(.99)) X_train_pca = pca.fit_transform(X_train_scaled) X_test_pca = pca.fit_transform(X_test_scaled) # extract csp features csp = CSP(n_components=6, reg=0.1, log=True) X_train_csp = csp.fit_transform(X_train_pca, y_train) X_test_csp = csp.fit_transform(X_test_pca, y_test) features_train.append(X_train_csp) features_test.append(X_test_csp) # concatenate features from different frequency bands X_train_feat = np.concatenate(features_train, axis=1) X_test_feat = np.concatenate(features_test, axis=1) # SVC svc = SVC(kernel="rbf") svc.fit(X_train_feat, y_train) y_pred = svc.predict(X_test_feat) print("Accuracy:", accuracy_score(y_test, y_pred))