Hello to everyone,
- MNE-Python version: 0.23.0
- MNE_bids version: 0.8
- operating system: Windows 10
- IDE: Spyder
I am working with the MEG dataset provided by Rathee et al. (2021). In short, it contains roughly 60 minutes of MEG-recordings (among other channels suchs as EOG and ECG) for 17 subjects who were asked to perform MI (hands / feets) and CI (word generation / subtraction) tasks.
Data: Link
Publication describing the data: Link
My goal is to train a binary classifier for each subject and each combination of stimuli which can predict the stimulus’ category (e.g., in epoch X subject 1 was imaging moving their hand), just like the authors did in their publication describing the data (see link above).
Currently, my classifier is returning very low below chance accuracies (e.g., 0.2 when chance level is 0.5). My guess is that I am performing my split of the data into train and test data at the wrong point during my analysis. Before, I only split it right before training the SVC model, which resulted in very high accuracies (up to 100%). However, it doesn’t feel right to preprocess the test data together with the training data since in an online BCI setting, this wouldn’t be happening either.
I would greatly appreciate any advice/thoughts/input on this. Thanks!
This is my current workflow for training a classifier for a single subject and a single condition:
- Read in raw data for subject X
- For each frequency band combination:
2.1 Filter the raw data in the given frequency range
2.2 Epoch the raw data
2.3 Subselect the epochs of interest (i.e., epochs where either event 1 or event 2 was the stimuli presented) + subselect time frame 0.5-3.5 seconds fromt the epochs
2.4 Split the filtered and epoched data into training and testing data sets
2.5 For training and test data, perform individually:
- Scale data
- Perform PCA
- Perform CSP
- Add CSP features to collection of all extracted CSP features from different frequency bands
- Fit SVC with training data (i.e., the FBCSP features)
- Predict ylabels for test data with SVC
- Compare for accuracy
The actual code:
from sklearn.model_selection import train_test_split
from mne.decoding import Scaler, CSP, UnsupervisedSpatialFilter
from sklearn.svm import SVC
from sklearn.decomposition import PCA
import numpy as np
import mne_helperfunctions as hf
from itertools import combinations
from sklearn.metrics import accuracy_score
# read in data
subject = "20"
session = "1"
ChanSel = "grad"
raw = hf.read_in_data(subject, session).pick_types(ChanSel).drop_channels(['MEG1733', 'MEG2333']).load_data()
event_names = ["Both Hand Imagery",
"Both Feet Imagery",
"Word Generation Imagery",
"Subtraction Imagery"]
nComb = list(combinations((event_names), 2))
event = nComb[0]
lowpass_freq = [8, 14]
highpass_freq = [12, 30]
# FBCSP
features_train = []
features_test = []
# extract spatio-temporal features for each frequency band
for iBand in range(len(highpass_freq)):
# filter data
lfreq = lowpass_freq[iBand]
hfreq = highpass_freq[iBand]
raw_filt = raw.copy().filter(lfreq, hfreq, method="iir")
# epoch data
data_events, data_event_id = hf.return_fixed_events(raw_filt)
epochs = hf.epoch_data(raw_filt, data_events, data_event_id)
# subselect data
epochs = epochs[event]
epochs = epochs.crop(tmin=0.5, tmax=3.5)
# split data
X_train, X_test, y_train, y_test = train_test_split(epochs.get_data(), epochs.events[:, 2], test_size=0.2, random_state=43)
# scale data
scaler = Scaler(epochs.info)
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)
# PCA
pca = UnsupervisedSpatialFilter(PCA(.99))
X_train_pca = pca.fit_transform(X_train_scaled)
X_test_pca = pca.fit_transform(X_test_scaled)
# extract csp features
csp = CSP(n_components=6, reg=0.1, log=True)
X_train_csp = csp.fit_transform(X_train_pca, y_train)
X_test_csp = csp.fit_transform(X_test_pca, y_test)
features_train.append(X_train_csp)
features_test.append(X_test_csp)
# concatenate features from different frequency bands
X_train_feat = np.concatenate(features_train, axis=1)
X_test_feat = np.concatenate(features_test, axis=1)
# SVC
svc = SVC(kernel="rbf")
svc.fit(X_train_feat, y_train)
y_pred = svc.predict(X_test_feat)
print("Accuracy:", accuracy_score(y_test, y_pred))