Subtracting arrays of dfh5 files

  • MNE-Python version: 0.22.0
  • operating system: Ubuntu 20.04.2 LTS

Dear MNE community,

I want to perform ’ mne.stats.spatio_temporal_cluster_1samp_test’ between two groups, each with 33 subjects. Data was preprocessed and a TFR_morlet was conducted to compute the time-frequency representation of my data.

Then I saved the dfh5 files in two different arrays (according to group), and now I want to subtract one array from another, to obtain a contrast-array ‘X’ as input for my ‘mne.stats.spatio_temporal_cluster_1samp_test’ function.

When I subtract them ( X = X_OGT_ogt - X_OGT_low ) I got the following error:

runcell(4, '/home/.../4._Spatial_Temporal_Cluster_Perm.py')
Traceback (most recent call last):

  File "/home/.../4._Spatial_Temporal_Cluster_Perm.py", line 63, in <module>
    X = X_OGT_ogt - X_OGT_low

  File "/home/bruno/anaconda3/envs/mne/lib/python3.8/site-packages/mne/time_frequency/tfr.py", line 1950, in __sub__
    out.data -= tfr.data

ValueError: operands could not be broadcast together with shapes (62,46,3401) (64,46,3401) (62,46,3401)

So, apparently some of my tfr files have a different number of channels (62 or 64). I don’t know what could have caused this… Maybe something during the preprocessing.

I tried to subtract data from just two subjects at a time in order to find the subjects with distict numbers of channels:

X = []

for i in range(33):
    try:
        X.append(X_OGT_ogt[i] - X_OGT_low[i])
    except:
        print(i)

Output:

runcell(6, '/home/.../4._Spatial_Temporal_Cluster_Perm.py')
0
1
5
8
9
10
17
22
23
26
28

So 11 pairs out of 33 could not be subtracted.

Does anyone know how to solve this problem? I would really appreciate any help.

Best,

Bruno

the simplistic answer is “if a channel is bad for 1 subject, drop it for all subjects”. If it’s the same 2 channels that are bad for all subjects, losing 2 out of 64 is probably OK. Unfortunately it’s often a different channel that is bad for each subject.

Start by changing your except clause so that it prints at least the number of good channels for each subject so you know how bad the problem is:

print(f'subj index: {i:>2}   n good chs: {X_OGT_ogt[i].shape[0]} vs {X_OGT_low[i].shape[0]}')

ultimately you’ll probably need to redo your pre-processing. This new-ish function will probably help: mne.preprocessing.equalize_bads — MNE 0.23.0 documentation

Hi, thanks a lot for your reply,

As you’ve suggested, the output for:

X = []

for i in range(33):
    try:
        X.append(X_OGT_ogt[i] - X_OGT_low[i])
    except:
        print(f'subj index: {i:>2}   n good chs: {X_OGT_ogt[i].shape[0]} vs {X_OGT_low[i].shape[0]}')

was:


runcell(8, '/home/.../4._Spatial_Temporal_Cluster_Perm.py')
subj index:  0   n good chs: 1 vs 1
subj index:  1   n good chs: 1 vs 1
subj index:  5   n good chs: 1 vs 1
subj index:  8   n good chs: 1 vs 1
subj index:  9   n good chs: 1 vs 1
subj index: 10   n good chs: 1 vs 1
subj index: 17   n good chs: 1 vs 1
subj index: 22   n good chs: 1 vs 1
subj index: 23   n good chs: 1 vs 1
subj index: 26   n good chs: 1 vs 1
subj index: 28   n good chs: 1 vs 1

I guess the command you wrote ‘n good chs: {X_OGT_ogt[i].shape[0]}’ meant to refer to the 0th dimension of my ‘X_OGT_ogt’ array, which would be the channel number. However, my ‘X_OGT_ogt’ array has size (33,1).

Maybe because I saved the dfh5 files in it? They way I did create the ‘X_OGT_ogt’ array was by loading my TFR_morlet files (-tfr.h5) iteratively, reading them using ‘mne.time_frequency.read_tfrs’ and then appending it to a list called ‘TFR_OGT_ogt

fname_OGT_ogt = '/home/.../tfr_OGT_p%s_1-tfr.h5'%(subj)
TFR_OGT_ogt_data = mne.time_frequency.read_tfrs(fname_OGT_ogt)
TFR_OGT_ogt.append(TFR_OGT_ogt_data)

Then I created the array called ‘X_OGT_ogt’ with the contents of that list and it has shape (33,1)

X_OGT_ogt = np.stack(TFR_OGT_ogt, axis=0)

# check the dimensions
print(X_OGT_ogt.shape)

# Save array

np.save('/home/.../tfr_OGT_ogt', X_OGT_ogt)

Should I have created this array ‘X_OGT_ogt’ differently?

Best,

Bruno

not necessarily, Maybe I just made a wrong guess about how your data was organized. The goal here is getting it to print out how many channels each subject has (so you can make an informed decision about how to proceed), so adjust the index passed to .shape in order to get it to do that.

1 Like

I was working on it and I did as follows:


X = []

for i in range(33):
    try:
        X.append(X_OGT_ogt[i] - X_OGT_low[i])
    except:
        print(f'subj index: {i:>2}   n good chs: {X_OGT_ogt[i][0].data.shape[0]} vs {X_OGT_low[i][0].data.shape[0]}')

And I got this:

runcell(8, '/home/.../4._Spatial_Temporal_Cluster_Perm.py')
subj index:  0   n good chs: 62 vs 64
subj index:  1   n good chs: 63 vs 64
subj index:  5   n good chs: 64 vs 62
subj index:  8   n good chs: 61 vs 64
subj index:  9   n good chs: 62 vs 64
subj index: 10   n good chs: 64 vs 62
subj index: 17   n good chs: 64 vs 63
subj index: 22   n good chs: 61 vs 64
subj index: 23   n good chs: 64 vs 63
subj index: 26   n good chs: 62 vs 64
subj index: 28   n good chs: 64 vs 62

well, that’s good news. The worst subject has only 3 bad channels. Next step might be to get your loop to print which channels are missing… but at this stage it’s probably easier to go back to your preprocessing stage because there are better tools for handling it at that point:

If you can’t do that (i.e., you don’t have access to the raw data / preprocessing scripts) then I think you’re stuck with having to include only the set of channels that are present for all subjects.

1 Like

I’ve checked which channels are missing for every subject and altogether they were eight missing:

Chann_list = ['Fp1','Fp2','F7','F3','Fz','F4','F8','FC5','FC1','FC2','FC6','T7','C3','Cz','C4','T8','TP9','CP5','CP1','CP2','CP6','TP10','P7','P3','Pz','P4','P8','PO9','O1','Oz','O2','PO10','AF7','AF3','AF4','AF8','F5','F1','F2','F6','FT9','FT7','FC3','FC4','FT8','FT10','C5','C1','C2','C6','TP7','CP3','CPz','CP4','TP8','P5','P1','P2','P6','PO7','PO3','POz','PO4','PO8']

X = []

for i in range(33):
    try:
        X.append(X_OGT_ogt[i] - X_OGT_low[i])
    except:
        print(f'subj index: {i:>2}   n good chs: {X_OGT_ogt[i][0].data.shape[0]} vs {X_OGT_low[i][0].data.shape[0]}')
        # Check missing channels:
        print('OGT_ogt:')
        print(set(Chann_list)-set(X_OGT_ogt[i][0].ch_names))
        print('OGT_low:')
        print(set(Chann_list)-set(X_OGT_low[i][0].ch_names))

Output:


subj index:  0   n good chs: 62 vs 64
OGT_ogt:
{'FC1', 'FC2'}
OGT_low:
set()
subj index:  1   n good chs: 63 vs 64
OGT_ogt:
{'F8'}
OGT_low:
set()
subj index:  5   n good chs: 64 vs 62
OGT_ogt:
set()
OGT_low:
{'F8', 'F7'}
subj index:  8   n good chs: 61 vs 64
OGT_ogt:
{'FC1', 'FC2', 'PO9'}
OGT_low:
set()
subj index:  9   n good chs: 62 vs 64
OGT_ogt:
{'FT7', 'AF7'}
OGT_low:
set()
subj index: 10   n good chs: 64 vs 62
OGT_ogt:
set()
OGT_low:
{'FC1', 'FC2'}
subj index: 17   n good chs: 64 vs 63
OGT_ogt:
set()
OGT_low:
{'FC2'}
subj index: 22   n good chs: 61 vs 64
OGT_ogt:
{'FC1', 'FC2', 'F7'}
OGT_low:
set()
subj index: 23   n good chs: 64 vs 63
OGT_ogt:
set()
OGT_low:
{'AF7'}
subj index: 26   n good chs: 62 vs 64
OGT_ogt:
{'FC1', 'FC2'}
OGT_low:
set()
subj index: 28   n good chs: 64 vs 62
OGT_ogt:
set()
OGT_low:
{'C2', 'FC2'}

So the bad channels are C1, FC2, F8, F7, PO9, FT7, AF7 and C2. I guess I cannot discard so many channels, so I should redo my pre-processing, just as you suggested.

  1. The point is, I really don’t know what might have caused those channels to be missing. I was marking bad channels manually and then interpolating them with ‘mne.io.raw.interpolate_bads’, so in theory, I should still have all my 64 channels left. Is it possible that some MNE function automatically deletes channels?

  2. I got a list with subjects that have less than 64 Channels, so I’ve just repeated the preprocessing of a subject that had 62 channels instead of 64. In the end I checked the number of channels and it was 62 again, although I double checked that I was not marking channels without interpolating them afterwards.

  3. The function mne.preprocessing.equalize_bads seems to be a good solution, but I still don’t know exactly where should I include it in my preprocessing pipeline and I couldn’t find any example of use. Should I mark the bad channels manually and then use ‘mne.preprocessing.equalize_bads’ instead of ‘mne.io.raw.interpolate_bads’? Should that be enough?

Best,

Bruno

interpolate_bads has a reset_bads=True option, maybe you set that as False? If so, the interpolated channel names would still be in the bads list, and later on they might still get dropped (maybe when doing the TFR? you can search the console output for “drop” to see where it’s happening maybe).

At this point the question is veering into “debugging your preprocessing script” which can be rather difficult to do through the forum… if you can’t find by yourself which step(s) are causing the dropped channels, you’ll probably need to trim down the preprocessing script as much as possible and provide us with a sample subject’s data where the channel dropping happens, in order for us to efficiently help with debugging. Often the process of trimming down the script will reveal the answer, so it’s a good approach to take generally.