Cluster-based permutation on Spatio-spectral data

Dear MNE Users,

I have been trying to perform permutation_cluster_1samp_test on my dataset of 30 subjects with their respective power densities across 19 channels. I wish to test the difference between 2 experimental conditions for all the 30 subjects. However, I am a bit sceptical about the way I am using the function. My input data are two 3D arrays(X1, X2), of shape subjects x frequency bins x channels . Here are my questions regarding the analysis.

  1. Does one need to use raw data only e.g. eeglab set files, for the function to detect clusters properly?
  • I have csv files containing the list subjects and their respective spectral powers for each frequency bin and channel that I use to create the 3D arrays. So for the connectivity matrix, I use separate eeglab set file(with the same sensor alignment) to import the channel locations. Is this method acceptable?
  1. Is the clustering always done spatio-temporally? Since the clusters I receive in the end (containing indices of channels and frequency bins) do not replicate the results from a classical t-test. What would be the correct way to form clusters in a spatio-spectral dimension?

  2. I receive 44 clusters in the end, but none of their p-values is less than 0.05. Does that mean that the two conditions are not significantly different from each other? Could changing the threshold yield some significant clusters? Also, What do the detected clusters whose p-values are >>0.05 tell us about the data?

I am new to EEG analysis, so kindly pardon if some of these questions are not well articulated. Many thanks in advance!

Best,
Manisha

1 Like

tagging @agramfort :slight_smile:

I think I can answer at least some of your questions:

With that function clustering is always done on the last dimension (for 2-dimensional X) or the last two dimensions (for 3-dimensional X). So in your case (3-dimensional X, n_subjs x n_freqs x n_sensors) it will cluster spectro-spatially. The exception is if your adjacency or max_step arguments constrain the clusters from ever forming in one of those dimensions. For example, maybe you don’t want to consider adjacent frequency bins to be “connected” for the purpose of clustering (i.e., if bin spacing were 5 Hz maybe it does not make sense to consider the 5 Hz and 10 Hz bins to potentially form a cluster). In that case you could pass max_step=0 and pass to adjacency an n_chan x n_chan matrix defining spatial adjacency of the sensor positions. If you do that, then your resulting clusters will always be spatial clusters and never spatio-spectral clusters because each cluster will never be allowed to span across multiple frequency bins. In that case any potential spatial clusters will be evaluated separately at each frequency bin.

Hopefully @agramfort or someone else will chime in with answers to the other parts of your question.

2 Likes

this was discussed during the MNE office hours last friday.

I suggested to use a CSP based approach as done in:

https://mne.tools/stable/auto_examples/decoding/plot_decoding_csp_timefreq.html#sphx-glr-auto-examples-decoding-plot-decoding-csp-timefreq-py

Alex

2 Likes

Hello Dan,

Thank you very much for your clear explanation. I pass an adjacency matrix of
n_chan x n_chan with max_step being set to default, i.e. 1. Still the clusters are not detected at the correct frequency bins. In my case, the frequency bin spacing is 0.25 Hz, so what would be the corresponding max_step for this?

the value of max_step refers to data rows, so if you want no clustering across frequency bins, then you pass max_step=0, regardless of bin spacing. I mentioned the 5-Hz bin spacing before because it’s a clear example of a case where you would almost always not want clusters to span across adjacent bins, regardless of what kind of experiment/analysis you’re doing.

Setting max_step=1 allows the clusters that are adjacent along the frequency axis to be combined into a single cluster. There are cases where you clearly don’t want this even when your bin spacing is fairly small: for example, if you’re dealing with an RSVP paradigm you might expect an extremely sharp peak at the stimulus presentation rate, so allowing the cluster to include adjacent frequency bins would not help. I suppose a case where you might want max_step=1 for a frequency-domain analysis would be a case where you expect fairly broad frequency-domain peaks (like a resting-state / cortical rhythms study maybe?)

I’m not sure what you mean here. Do you already know where the clusters are (based on analysis with some other software)? Or are you just not finding them where you want/expect them to occur?

1 Like

Thank you very much for the elaborate explanation! It truly helped me to understand the basis of clustering in the function.
I was actually trying to replicate results from individual t-tests and detect channel-frequency-bin pairs with significant differences. I realized that the set file I was using for for creating the adjacency matrix had a different montage than my actual data. The problem was fixed after setting the right montage.

1 Like