Cannonical Correlation Analysis (CCA) on EEG data

Hi,

I am working with EEG and SSVEP. Does anyone have experience with implementing Cannonical Correlation Analysis (CCA) on epoch data?

Thanks!
Lsh

It is implemented in scikit-learn (sklearn.cross_decomposition.CCA — scikit-learn 0.24.2 documentation) but I don’t think we have anything for CCA in MNE-Python (if we did, it would likely be a wrapper for the scikit-learn code anyway).

@agramfort do you know anyone who has done this?

1 Like

@Luna what are you aiming to do? you what to classify across subjects SSVEP conditions?

Alex

1 Like

Hi,
Thanks for the fast response and sorry for my slow response!

Yes, I know about the sklearn and I am currently playing around with that.
My aim is to do an CCA weighted power analysis; eg. I want to weight each channel according to their correlation to a sinusoidal reference signal. I struggle both with understanding the sklearn (input/output and dimensions) and combining it with my epochs. My current approach is getting the raw data array using epochs['Some events']._data and then looping trough and applying CCA to each event.

Luna

did you consider using something like this?

https://mne.tools/stable/auto_examples/decoding/decoding_spoc_CMC.html

Alex

1 Like

Looks interesting! I will definitely have a look at that.

Btw. I am basically trying to implement this and obtain what they call spatially weighted multi-channel signals and calculate the SNR: https://iopscience.iop.org/article/10.1088/1741-2552/aa550d/meta

Thanks for all your feedback! I’ll continue playing around with the sklearn :slight_smile:

Luna

Hi Luna,
I have some code for CCA on epochs. Its not pretty but it works, I can post an example function if you want, I’d have to just dig it out.

1 Like

Hi,

That would be awesome!! Thanks :slight_smile:

This is the function that I found:

import numpy as np
from scipy.stats import spearmanr

from sklearn.base import clone
from sklearn.cross_decomposition import CCA
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedKFold
from sklearn.pipeline import make_pipeline


def clf_across_time_n_fold(X=None, y=None, epochs=None, clf=None, n_folds=6,
                           return_clf=False, times_idx=None, stratify=None):
    '''Perform cross-decomposition for multiple time steps.

    Parameters
    ----------
    X : 3d array | None
        epochs x channels x samples matrix of data.
    y : 2d array | None
        epochs x 1 array of data.
    epochs : mne.Epochs | None
        Epochs data, use instead of X.
    clf : sklearn classifier | None
        Classifier to use, Canonical Correlation by default.
    n_folds : int
        The number of folds in StratifiedKFold cross-validation.
    return_clf : bool
        Whether to aggregate and return classifiers in a numpy object array of
        folds x samples.
    times_idx : list-like of int | None
        Time indices to use. If None all time points are used.
    stratify : numpy array | None
        When stratification should be performed with respect to the other variable,
        not `y`.

    Returns
    -------
    all_r : numpy array
        Folds x samples array of Spearman correlation coefficients on the test
        data.
    clf : numpy array
        If ``return_clf=True``: folds x samples numpy object array with
        classifiers.
    '''

    # prepare pipeline and k-fold
    if clf is None:
        clf = CCA(n_components=1)

    # get data from epochs or arrays
    if epochs is not None:
        n_times = epochs._data.shape[-1]
        all_X = epochs._data.copy()

        Y = epochs.metadata.step.values
    else:
        # make sure X and y are present and correct shape
        assert X is not None
        assert y is not None
        assert X.ndim == 3

        all_X = X
        Y = y
        n_times = all_X.shape[-1]

    if times_idx is None:
        times_idx = range(n_times)
    else:
        n_times = len(times_idx)

    stratify = Y if stratify is None else stratify
    pipeline = make_pipeline(StandardScaler(), clf)
    kfold = StratifiedKFold(n_splits=n_folds, random_state=23)

    X = all_X[:, :, 0]
    all_r = np.zeros((n_folds, n_times))

    if return_clf:
        clfs = np.empty((n_folds, n_times), dtype='object')

    # loop through folds and timepoints
    for fold_idx, (train_idx, test_idx) in enumerate(kfold.split(X, stratify)):
        for idx, t_idx in enumerate(times_idx):
            X = all_X[:, :, t_idx]

            # fit
            this_pipeline = clone(pipeline)
            this_pipeline.fit(X[train_idx, :, t_idx], Y[train_idx])

            # test
            X_test_r = this_pipeline.transform(X[test_idx, :, t_idx])
            r_tst, p_tst = spearmanr(Y[test_idx], X_test_r[:, 0])

            # aggregate
            all_r[fold_idx, idx] = r_tst
            if return_clf:
                clfs[fold_idx, idx] = this_pipeline

        print('*', end='')

    # return depending on return_clf
    if return_clf:
        return all_r, clfs
    else:
        return all_r

beware that when using epochs the function assumes you want to correlate the signal with epochs.metadata.step, so you can just use X and y as input. You can extract the relevant data from epochs via epochs.get_data().
The function could also be simplified by using mne’s SlidingEstimator - I just chose the lazy approach at the time of writing.

1 Like