System deadlock when saving TFR object after computing TFR from epochs

Hi,

I am trying to calculate TFR objects per subject x session from my group epochs object. My system seems to get in some sort of deadlock at the power.save() stage of the first subject x session (i.e. the first iteration of the loop). The terminal does not output any errors, but just gets stuck. My per session x subject epochs files are rather larger (±1GB). My metadata contains quite a lot of columns (±57). I’ve tried running the code with a smaller subset, but this gets stuck too. I have tried other drives on my PC, but those have the same issue. Task manager also does not show any indication of overusing any drive. Has anyone encountered this problem before and/or knows what could be underlying this?

mne version: 1.12.1

python version: 3.14.4

operating system: Windows 11

Thank you very much!

Julie

PS. If I comment out the power.save() line the script works without an issue, so it really is the saving part that causes the system deadlock.

epochs = mne.read_epochs(os.path.join(proc_dir, 'group-epo.fif.gz')).load_data()
df = epochs.metadata

# TFR analysis:
for (subj, ses), d in df.groupby(['subject_id', 'session_id']):

        print(f"Running TFR for subj={subj}, ses={ses}")

        # tfr:
        freqs = np.linspace(2,40,20) 
        n_cycles = freqs / 2.0  # different number of cycle per frequency
        ind = (epochs.metadata['subject_id']==subj)&(epochs.metadata['session_id']==ses)
        subset = epochs[ind].copy()
        power = subset.compute_tfr(
            method="morlet",
            freqs=freqs,
            n_cycles=n_cycles,
            average=False,
            return_itc=False,
            decim=5,
            n_jobs=-1
        )

        # save power:
        power.save(os.path.join(proc_dir, 'power_{}_{}.hdf5'.format(subj, ses)), overwrite=True)

        # free memory:
        del subset, power

Just to be sure, if you remove or comment out the power.save(...) line, does everything work (i.e., does the script terminate without any errors, or does it get stuck too)?

Thanks for checking! The script runs without any errors and does not get stuck if I comment out the power.save line.

I notice you say you’re using Python 3.8, but Python 3.10 is the minimum supported version since MNE 1.9.

Not sure if that could be contributing to the issue, but perhaps worth upgrading your Python version in any case.

I just double checked and was mistaken, the Python version currently in use is 3.14.4. I will edit this in the original post, thanks for commenting!

Thanks for clarifying that.

When you say it is also hanging for a smaller subset, how small was this actually? If you save this really minimal example, does it still hang?

import numpy as np
from mne import create_info
from mne.time_frequency import EpochsTFRArray

rng = np.random.default_rng(0)

n_epochs, n_chans, n_freqs, n_times = 2, 3, 5, 10
sfreq = 100.0
data = rng.standard_normal((n_epochs, n_chans, n_freqs, n_times))
info = create_info(n_chans, sfreq)

tfr = EpochsTFRArray(info=info, data=data, times=np.arange(n_times), freqs=np.arange(n_freqs))
tfr.save("test-tfr.hdf5", overwrite=True)

Please can you also share the output of mne.sys_info().

Is it possible that you could share an anonymised form of one of the problematic TFR objects on something like Dropbox?

Hi Thomas,

I tried the minimal toy example: this works! Does not hang + saves.

See here the size of a subset of epochs on which I call subset.compute_tfr() and its resulting power/tfr object:

In [5]: subset
Out[5]:
<EpochsFIF | 3334 events (all good), -1.352 – 1.648 s (baseline -0.25 – 0 s), ~695.4 MiB, data loaded, with metadata,
‘sound_a’: 1696
‘sound_b’: 1638>

In [1]: power
Out[1]: <Power Estimates from Epochs, morlet method | 3334 epochs × 64 channels × 20 freqs × 77 times, 2.0 - 40.0 Hz, -1.35 - 1.62 s, 2.45 GiB>

And the size of the ‘smaller subset’ I tried, it really is small:

In [6]: epochs_small = subset[1:10]

In [7]: epochs_small
Out[7]:
<EpochsFIF | 9 events (all good), -1.352 – 1.648 s (baseline -0.25 – 0 s), ~2.0 MiB, data loaded, with metadata,
‘sound_b’: 9>

In [11]: power_small
Out[11]: <Power Estimates from Epochs, morlet method | 9 epochs × 64 channels × 20 freqs × 77 times, 2.0 - 40.0 Hz, -1.35 - 1.62 s, 6.8 MiB>

But also this small subset has the same problem of getting stuck at the power.save() line.

What do you exactly suggest me to share with you in anonymized form? The epoch subset or the resulting TFR object? As for the latter, I wouldn’t know how to share it since I cannot save it…

See below the output of mne.sys_info()

In [4]: mne.sys_info()
Platform             Windows-11-10.0.26200-SP0
Python               3.14.4 | packaged by conda-forge | (main, Apr  8 2026, 02:08:03) [MSC v.1944 64 bit (AMD64)]
Executable           C:\Users\jhooman.conda\envs\mne\python.exe
CPU                  AMD Ryzen Threadripper PRO 7975WX 32-Cores (64 cores)
Memory               255.2 GiB

C:\Users\jhooman.conda\envs\mne\Lib\site-packages\threadpoolctl.py:1226: RuntimeWarning:
Found Intel OpenMP (‘libiomp’) and LLVM OpenMP (‘libomp’) loaded at
the same time. Both libraries are known to be incompatible and this
can cause random crashes or deadlocks on Linux when loaded in the
same Python program.
Using threadpoolctl may cause crashes or deadlocks. For more
information and possible workarounds, please see
warnings.warn(msg, RuntimeWarning)
Core

mne               1.12.1 (latest release)

numpy             2.4.3 (MKL 2025.3-Product with 32 threads)

scipy             1.17.1

matplotlib        3.10.9 (backend=qtagg)

Numerical (optional)

sklearn           1.8.0

numba             0.65.1

nibabel           5.4.0

nilearn           0.13.1

dipy              1.12.1

openmeeg          2.5.16

pandas            3.0.2

h5io              0.2.5

h5py              3.16.0

unavailable       cupy

Visualization (optional)

pyvista           0.48.0 (OpenGL 4.5 (Core Profile) Mesa 26.0.3 via llvmpipe (LLVM 22.1.1, 256 bits))

pyvistaqt         0.11.4

vtk               9.6.1

qtpy              2.4.3 (PySide6=6.11.0)

pyqtgraph         0.14.0

mne-qt-browser    0.7.4

ipywidgets        8.1.8

trame_client      3.12.1

trame_server      3.10.0

trame_vtk         2.11.8

trame_vuetify     3.2.2

unavailable       ipympl

Ecosystem (optional)

mne-bids          0.18.0

eeglabio          0.1.3

edfio             0.4.13

curryreader       0.1.2

mffpy             0.10.0

pybv              0.7.6

antio             0.7.0

defusedxml        0.7.1

unavailable       mne-nirs, mne-features, mne-connectivity, mne-icalabel, mne-bids-pipeline, neo, pymef

Thank you so much for helping!

Thanks for sharing the system info. h5io and h5py are both up to date.
If the minimal example I sent works, then maybe it is just a resource issue, even if you do have a pretty beefy system. Though I’m not entirely sure why this wouldn’t appear in task manager.

Hah, good point! Perhaps then one of the epoch subsets you are computing TFR from which the saving hangs for.

Sure I can! See here one epoch subset for which the saving hangs:

Thanks for sharing that data.

I also see the saving hang, even for the smaller subset of only 9 epochs. Debugging shows the saving gets ‘stuck’ on the drop log, which turns out has >7 million entries :grimacing:

Setting the drop log to None allows the TFR data to be saved. Obviously it’s not ideal to have to remove the drop log if there’s relevant information there, but I can’t imagine there being millions of genuine entries.

I would say there is probably a bug in your pipeline causing drop log entries to be duplicated, however reading your original code snippet, I see you are loading one master epochs file and then extracting the data for each subject/session.

If this epochs object came from a huge number of subjects/sessions, you could end up with one giant drop log, which doesn’t get broken up when selecting specific epochs (that’s intended behaviour).

If that’s the case, it might be worth considering re-designing your pipeline such that epochs for individual subjects/sessions get saved separately. If not, are you able to track down when the number of drop log entries explodes in your pipeline?

Thank you so much!! It runs now! I will dive into my code to see where this strange accumulation came from. :slight_smile: