# Seeking Guidance on Statistical Tests for SEEG Analysis with MNE

Hi everyone,

I’ve recently been working with SEEG data in MNE, focusing primarily on (generalized linear model) GLM analysis. However, I have some uncertainties regarding the statistical tests applicable to my analysis, and I would appreciate any insights or suggestions you could provide.

## Background

### Preprocessing

Assume I am analyzing data from a single subject, specifically focusing on contacts located in the orbitofrontal cortex (OFC).

The general workflow involves preprocessing the SEEG data, epoching (`epochs`), and then computing time-frequency representation (`tfr`) for channels (contacts) within OFC.

``````# pseudocode
epochs = seeg_preprocess(sub)
tfr = epochs.compute_tfr(picks=["OFC"])
# tfr.data.shape: (n_trials, n_chs, n_freqs, n_timepoints)
``````

Next, I extract the high-gamma (80-150Hz) activity for GLM analysis. Specifically, I extract data from this frequency band and compute the mean across these frequencies, resulting in `hga` with the shape of `n_trials, n_chs, and n_timepoints`, which represent high-gamma activity.

``````# pseudocode
hga = tfr.data[:, :, 80:150, :].mean(axis=2)
# shape: (n_trials, n_chs, n_timepoints)
``````

### GLM

Each channel’s data in OFC (with shape `n_trials,n_timepoints` for one channel) is then analyzed using a GLM, with a behavioural metric `X` (assuming a single regressor, shape: `n_trials,1`) predicting the HGA activity (`hga`) on each time point. So the output (`glm_betas_ts`) is a beta time course for all channels, reflecting the association between the behavioral metric and HGA over time.

``````# pseudocode
glm_betas_ts = np.array([glm(y=hga[:, ch], x=X).fit().coef for ch in range(hga.shape[1])]).squeeze()
# shape: (n_chs, n_timepoints)
``````

## Statistics

Currently, I am using `permutation_cluster_1samp_test` for statistical testing, treating multiple contacts as multiple samples.

``````# pseudocode
# glm_betas_ts.shape: (n_chs, n_timepoints)
_, c, p, _ = permutation_cluster_1samp_test(glm_betas_ts, tail=0, ...)
``````

final result example (two lines correspond to two behavior metrics, horizontal segments represent significantly different from 0):

Additionally, I’ve considered not averaging over the frequency bands and directly performing GLM analysis on the entire TFR dataset for each frequency. I used a similar statistical method like the beta time course:

``````# pseudocode
glm_betas_tfr = np.array([glm(y=tfr.data[:, ch, :, :], x=X).fit().coef for ch in range(tfr.data.shape[1])]).squeeze()
# glm_betas_tfr.shape: (n_chs, n_freqs, n_timepoints)
_, c, p, _ = permutation_cluster_1samp_test(glm_betas_tfr, tail=0, ...)
``````

final result example:

## Question

1. Is the logic of statistical testing in these scenarios appropriate?
2. Are there other considerations or methods I should consider to ensure the robustness and accuracy of my findings?
3. I tried different setups for the `permutation_cluster_1samp_test`, including default parameters (`tail=0`) or TFCE (with `threshold=dict(xxxx)`), but got quite different results. I want to know if it’s normal (I will put data and results later if needed).

Any guidance or references to similar analyses or resources would be greatly appreciated. Thank you!

Best Regards,
Kun

Those are pretty standard in the field. You could try a classification-based approach which might be more stable with cross-validation. You might need more trials for a more stable cluster permutation. Here’s a paper I wrote that might help Stereo-EEG recordings extend known distributions of canonical movement-related oscillations - PubMed

Thank you! I will check the paper for more details.
Can you briefly explain more about why the classification-based method might be helpful in this case?

Depending on the method, you could use a non-linear classifier which would be different in that way, but just in general, in my opinion and experience using cross-validation leads to more stable results.