Subtracting arrays of dfh5 files

bmmansur · May 17, 2021, 6:28pm

MNE-Python version: 0.22.0
operating system: Ubuntu 20.04.2 LTS

Dear MNE community,

I want to perform ’ mne.stats.spatio_temporal_cluster_1samp_test’ between two groups, each with 33 subjects. Data was preprocessed and a TFR_morlet was conducted to compute the time-frequency representation of my data.

Then I saved the dfh5 files in two different arrays (according to group), and now I want to subtract one array from another, to obtain a contrast-array ‘X’ as input for my ‘mne.stats.spatio_temporal_cluster_1samp_test’ function.

When I subtract them ( X = X_OGT_ogt - X_OGT_low ) I got the following error:

runcell(4, '/home/.../4._Spatial_Temporal_Cluster_Perm.py')
Traceback (most recent call last):

  File "/home/.../4._Spatial_Temporal_Cluster_Perm.py", line 63, in <module>
    X = X_OGT_ogt - X_OGT_low

  File "/home/bruno/anaconda3/envs/mne/lib/python3.8/site-packages/mne/time_frequency/tfr.py", line 1950, in __sub__
    out.data -= tfr.data

ValueError: operands could not be broadcast together with shapes (62,46,3401) (64,46,3401) (62,46,3401)

So, apparently some of my tfr files have a different number of channels (62 or 64). I don’t know what could have caused this… Maybe something during the preprocessing.

I tried to subtract data from just two subjects at a time in order to find the subjects with distict numbers of channels:

X = []

for i in range(33):
    try:
        X.append(X_OGT_ogt[i] - X_OGT_low[i])
    except:
        print(i)

Output:

runcell(6, '/home/.../4._Spatial_Temporal_Cluster_Perm.py')
0
1
5
8
9
10
17
22
23
26
28

So 11 pairs out of 33 could not be subtracted.

Does anyone know how to solve this problem? I would really appreciate any help.

Best,

Bruno

drammock · May 17, 2021, 8:43pm

the simplistic answer is “if a channel is bad for 1 subject, drop it for all subjects”. If it’s the same 2 channels that are bad for all subjects, losing 2 out of 64 is probably OK. Unfortunately it’s often a different channel that is bad for each subject.

Start by changing your except clause so that it prints at least the number of good channels for each subject so you know how bad the problem is:

print(f'subj index: {i:>2}   n good chs: {X_OGT_ogt[i].shape[0]} vs {X_OGT_low[i].shape[0]}')

ultimately you’ll probably need to redo your pre-processing. This new-ish function will probably help: mne.preprocessing.equalize_bads — MNE 0.23.0 documentation

bmmansur · May 18, 2021, 11:33am

Hi, thanks a lot for your reply,

As you’ve suggested, the output for:

X = []

for i in range(33):
    try:
        X.append(X_OGT_ogt[i] - X_OGT_low[i])
    except:
        print(f'subj index: {i:>2}   n good chs: {X_OGT_ogt[i].shape[0]} vs {X_OGT_low[i].shape[0]}')

was:


runcell(8, '/home/.../4._Spatial_Temporal_Cluster_Perm.py')
subj index:  0   n good chs: 1 vs 1
subj index:  1   n good chs: 1 vs 1
subj index:  5   n good chs: 1 vs 1
subj index:  8   n good chs: 1 vs 1
subj index:  9   n good chs: 1 vs 1
subj index: 10   n good chs: 1 vs 1
subj index: 17   n good chs: 1 vs 1
subj index: 22   n good chs: 1 vs 1
subj index: 23   n good chs: 1 vs 1
subj index: 26   n good chs: 1 vs 1
subj index: 28   n good chs: 1 vs 1

I guess the command you wrote ‘n good chs: {X_OGT_ogt[i].shape[0]}’ meant to refer to the 0th dimension of my ‘X_OGT_ogt’ array, which would be the channel number. However, my ‘X_OGT_ogt’ array has size (33,1).

Maybe because I saved the dfh5 files in it? They way I did create the ‘X_OGT_ogt’ array was by loading my TFR_morlet files (-tfr.h5) iteratively, reading them using ‘mne.time_frequency.read_tfrs’ and then appending it to a list called ‘TFR_OGT_ogt’

fname_OGT_ogt = '/home/.../tfr_OGT_p%s_1-tfr.h5'%(subj)
TFR_OGT_ogt_data = mne.time_frequency.read_tfrs(fname_OGT_ogt)
TFR_OGT_ogt.append(TFR_OGT_ogt_data)

Then I created the array called ‘X_OGT_ogt’ with the contents of that list and it has shape (33,1)

X_OGT_ogt = np.stack(TFR_OGT_ogt, axis=0)

# check the dimensions
print(X_OGT_ogt.shape)

# Save array

np.save('/home/.../tfr_OGT_ogt', X_OGT_ogt)

Should I have created this array ‘X_OGT_ogt’ differently?

Best,

Bruno

drammock · May 18, 2021, 1:55pm

not necessarily, Maybe I just made a wrong guess about how your data was organized. The goal here is getting it to print out how many channels each subject has (so you can make an informed decision about how to proceed), so adjust the index passed to .shape in order to get it to do that.

bmmansur · May 18, 2021, 2:00pm

I was working on it and I did as follows:


X = []

for i in range(33):
    try:
        X.append(X_OGT_ogt[i] - X_OGT_low[i])
    except:
        print(f'subj index: {i:>2}   n good chs: {X_OGT_ogt[i][0].data.shape[0]} vs {X_OGT_low[i][0].data.shape[0]}')

And I got this:

runcell(8, '/home/.../4._Spatial_Temporal_Cluster_Perm.py')
subj index:  0   n good chs: 62 vs 64
subj index:  1   n good chs: 63 vs 64
subj index:  5   n good chs: 64 vs 62
subj index:  8   n good chs: 61 vs 64
subj index:  9   n good chs: 62 vs 64
subj index: 10   n good chs: 64 vs 62
subj index: 17   n good chs: 64 vs 63
subj index: 22   n good chs: 61 vs 64
subj index: 23   n good chs: 64 vs 63
subj index: 26   n good chs: 62 vs 64
subj index: 28   n good chs: 64 vs 62

drammock · May 18, 2021, 2:05pm

well, that’s good news. The worst subject has only 3 bad channels. Next step might be to get your loop to print which channels are missing… but at this stage it’s probably easier to go back to your preprocessing stage because there are better tools for handling it at that point:

If you can’t do that (i.e., you don’t have access to the raw data / preprocessing scripts) then I think you’re stuck with having to include only the set of channels that are present for all subjects.

bmmansur · May 18, 2021, 6:49pm

I’ve checked which channels are missing for every subject and altogether they were eight missing:

Chann_list = ['Fp1','Fp2','F7','F3','Fz','F4','F8','FC5','FC1','FC2','FC6','T7','C3','Cz','C4','T8','TP9','CP5','CP1','CP2','CP6','TP10','P7','P3','Pz','P4','P8','PO9','O1','Oz','O2','PO10','AF7','AF3','AF4','AF8','F5','F1','F2','F6','FT9','FT7','FC3','FC4','FT8','FT10','C5','C1','C2','C6','TP7','CP3','CPz','CP4','TP8','P5','P1','P2','P6','PO7','PO3','POz','PO4','PO8']

X = []

for i in range(33):
    try:
        X.append(X_OGT_ogt[i] - X_OGT_low[i])
    except:
        print(f'subj index: {i:>2}   n good chs: {X_OGT_ogt[i][0].data.shape[0]} vs {X_OGT_low[i][0].data.shape[0]}')
        # Check missing channels:
        print('OGT_ogt:')
        print(set(Chann_list)-set(X_OGT_ogt[i][0].ch_names))
        print('OGT_low:')
        print(set(Chann_list)-set(X_OGT_low[i][0].ch_names))

Output:


subj index:  0   n good chs: 62 vs 64
OGT_ogt:
{'FC1', 'FC2'}
OGT_low:
set()
subj index:  1   n good chs: 63 vs 64
OGT_ogt:
{'F8'}
OGT_low:
set()
subj index:  5   n good chs: 64 vs 62
OGT_ogt:
set()
OGT_low:
{'F8', 'F7'}
subj index:  8   n good chs: 61 vs 64
OGT_ogt:
{'FC1', 'FC2', 'PO9'}
OGT_low:
set()
subj index:  9   n good chs: 62 vs 64
OGT_ogt:
{'FT7', 'AF7'}
OGT_low:
set()
subj index: 10   n good chs: 64 vs 62
OGT_ogt:
set()
OGT_low:
{'FC1', 'FC2'}
subj index: 17   n good chs: 64 vs 63
OGT_ogt:
set()
OGT_low:
{'FC2'}
subj index: 22   n good chs: 61 vs 64
OGT_ogt:
{'FC1', 'FC2', 'F7'}
OGT_low:
set()
subj index: 23   n good chs: 64 vs 63
OGT_ogt:
set()
OGT_low:
{'AF7'}
subj index: 26   n good chs: 62 vs 64
OGT_ogt:
{'FC1', 'FC2'}
OGT_low:
set()
subj index: 28   n good chs: 64 vs 62
OGT_ogt:
set()
OGT_low:
{'C2', 'FC2'}

So the bad channels are C1, FC2, F8, F7, PO9, FT7, AF7 and C2. I guess I cannot discard so many channels, so I should redo my pre-processing, just as you suggested.

The point is, I really don’t know what might have caused those channels to be missing. I was marking bad channels manually and then interpolating them with ‘mne.io.raw.interpolate_bads’, so in theory, I should still have all my 64 channels left. Is it possible that some MNE function automatically deletes channels?
I got a list with subjects that have less than 64 Channels, so I’ve just repeated the preprocessing of a subject that had 62 channels instead of 64. In the end I checked the number of channels and it was 62 again, although I double checked that I was not marking channels without interpolating them afterwards.
The function mne.preprocessing.equalize_bads seems to be a good solution, but I still don’t know exactly where should I include it in my preprocessing pipeline and I couldn’t find any example of use. Should I mark the bad channels manually and then use ‘mne.preprocessing.equalize_bads’ instead of ‘mne.io.raw.interpolate_bads’? Should that be enough?

Best,

Bruno

drammock · May 18, 2021, 7:32pm

interpolate_bads has a reset_bads=True option, maybe you set that as False? If so, the interpolated channel names would still be in the bads list, and later on they might still get dropped (maybe when doing the TFR? you can search the console output for “drop” to see where it’s happening maybe).

At this point the question is veering into “debugging your preprocessing script” which can be rather difficult to do through the forum… if you can’t find by yourself which step(s) are causing the dropped channels, you’ll probably need to trim down the preprocessing script as much as possible and provide us with a sample subject’s data where the channel dropping happens, in order for us to efficiently help with debugging. Often the process of trimming down the script will reveal the answer, so it’s a good approach to take generally.

Topic		Replies	Views
TFR Analysis and Cluster-Based Permutation with 1 sample and two conditions Support & Discussions time-frequency , eeg	7	194	April 1, 2025
mn.time_frequency.tfr_array_stockwell Mailing List Archive (read-only) list-archive	5	291	July 10, 2020
TFR Analysis from CSV File Support & Discussions eeg , epochs	1	178	October 17, 2023
quick way to get lateralisation Support & Discussions eeg	1	168	October 3, 2023
Data for permutation cluster 1samp test Support & Discussions eeg , statistics	3	433	August 11, 2021

Subtracting arrays of dfh5 files

Related topics