Hi everyone,
I’ve recently been working with SEEG data in MNE, focusing primarily on (generalized linear model) GLM analysis. However, I have some uncertainties regarding the statistical tests applicable to my analysis, and I would appreciate any insights or suggestions you could provide.
Background
Preprocessing
Assume I am analyzing data from a single subject, specifically focusing on contacts located in the orbitofrontal cortex (OFC).
The general workflow involves preprocessing the SEEG data, epoching (epochs
), and then computing time-frequency representation (tfr
) for channels (contacts) within OFC.
# pseudocode
epochs = seeg_preprocess(sub)
tfr = epochs.compute_tfr(picks=["OFC"])
# tfr.data.shape: (n_trials, n_chs, n_freqs, n_timepoints)
Next, I extract the high-gamma (80-150Hz) activity for GLM analysis. Specifically, I extract data from this frequency band and compute the mean across these frequencies, resulting in hga
with the shape of n_trials, n_chs, and n_timepoints
, which represent high-gamma activity.
# pseudocode
hga = tfr.data[:, :, 80:150, :].mean(axis=2)
# shape: (n_trials, n_chs, n_timepoints)
GLM
Each channel’s data in OFC (with shape n_trials,n_timepoints
for one channel) is then analyzed using a GLM, with a behavioural metric X
(assuming a single regressor, shape: n_trials,1
) predicting the HGA activity (hga
) on each time point. So the output (glm_betas_ts
) is a beta time course for all channels, reflecting the association between the behavioral metric and HGA over time.
# pseudocode
glm_betas_ts = np.array([glm(y=hga[:, ch], x=X).fit().coef for ch in range(hga.shape[1])]).squeeze()
# shape: (n_chs, n_timepoints)
Statistics
Currently, I am using permutation_cluster_1samp_test
for statistical testing, treating multiple contacts as multiple samples.
# pseudocode
# glm_betas_ts.shape: (n_chs, n_timepoints)
_, c, p, _ = permutation_cluster_1samp_test(glm_betas_ts, tail=0, ...)
final result example (two lines correspond to two behavior metrics, horizontal segments represent significantly different from 0):
Additionally, I’ve considered not averaging over the frequency bands and directly performing GLM analysis on the entire TFR dataset for each frequency. I used a similar statistical method like the beta time course:
# pseudocode
glm_betas_tfr = np.array([glm(y=tfr.data[:, ch, :, :], x=X).fit().coef for ch in range(tfr.data.shape[1])]).squeeze()
# glm_betas_tfr.shape: (n_chs, n_freqs, n_timepoints)
_, c, p, _ = permutation_cluster_1samp_test(glm_betas_tfr, tail=0, ...)
final result example:
Question
- Is the logic of statistical testing in these scenarios appropriate?
- Are there other considerations or methods I should consider to ensure the robustness and accuracy of my findings?
- I tried different setups for the
permutation_cluster_1samp_test
, including default parameters (tail=0
) or TFCE (withthreshold=dict(xxxx)
), but got quite different results. I want to know if it’s normal (I will put data and results later if needed).
Any guidance or references to similar analyses or resources would be greatly appreciated. Thank you!
Best Regards,
Kun