concatenate_raws produces EDFs files with no output

rzhao · February 21, 2024, 1:12am

MNE version: e.g. 1.6.1
operating system: windows 10
hi! Sorry I’m very new here and want to write a script that concatenates a bunch of 5 min long continuous EDF files with EEG data over 11 hours at a time (i.e. 121 total files)

My code essentially looks like the following…

trimmed = [trim_and_decimate(to_edf(edf), 200) for edf in interval.files]
res = mne.concatenate_raws(trimmed)
mne.export.export_raw(<name>, mne.set_bipolar_reference(<res>, ..., ..., ...)), 'edf')

trim_and_decimate drops unwanted channels and downsamples the frequency
to_edf simply returns the result of mne.io.read_raw_edf(<edf_path>)

i’ve able to succeed at concatenating up to 110 files (~120,000 Kb file) until I just get no output at all (even though if I print the info of the Raw object prior to exporting, there do seem to be values.)

Any idea of where things might be going wrong? I am unsure why concatenation is working just fine until I hit this threshold. Any help would be appreciated!

cbrnr · February 21, 2024, 8:31am

I tried this with the following example (which concatenates 1000 Raw objects, each 5 minutes long), and it works:

from numpy.random import default_rng

from mne import concatenate_raws, create_info
from mne.io import RawArray


def create_toy_data(n_channels=16, duration=300, sfreq=250, seed=None):
    """Create toy data."""
    rng = default_rng(seed)
    data = rng.standard_normal(size=(n_channels, duration * sfreq)) * 5e-6
    info = create_info(n_channels, sfreq, "eeg")
    return RawArray(data, info)


raws = [create_toy_data() for _ in range(1000)]
raw = concatenate_raws(raws)
raw.plot(block=True)

Can you try if this works for you?

Also, the screenshot you are showing is not the MNE browser, so it might be a problem with this app.

rzhao · March 18, 2024, 10:33am

Hi, thank you for spending the time to trial! I tried the plot function and got something that looks like this. I’ve tried to use +/- to scale the visual but it seems to always appear as a massive block. Do you have any advice for making the signals more viewer friendly? Is this what your output looked like?

cbrnr · March 18, 2024, 11:03am

Your signals are probably scaled incorrectly, please see How to scale properly my plot? - #7 by mscheltienne.

rzhao · March 21, 2024, 6:44am

Thank you so much! It may very well be the third party plotting program I was using. The built-in MNE plotting seems great. May I ask what the difference is between raw.plot() and mne.viz.plot_raw()? Also is there a way to save down the interactive plot - is it so plt.savefig()?

cbrnr · March 21, 2024, 6:56am

There’s no difference, raw.plot() internally calls mne.viz.plot_raw() (so in general you should pretty much always use raw.plot()).

Yes, raw.plot() returns the Figure object, which you can then use for further modifications or saving.

rzhao · March 21, 2024, 7:06am

Sorry I should have clarified. Is there any way to save down the .plt interactive plot rather than just a png? I looked into savefig, and it appears the available formats are only #savefig.format: png # {png, ps, pdf, svg}?

cbrnr · March 21, 2024, 7:11am

Why would you want to do that? You need to open the figure in Python anyway, so you might as well just use your script to reproduce the figure.

If you really do need to save the figure object to disk, you can pickle.dump() the Figure object.

rzhao · March 26, 2024, 11:18pm

Hi Mr. Brunner again! We’ve been able to successfully produce plots upon concatenating our files together, as seen below (thanks for all the help!)

However, upon exporting the raw file with the following function (our team wants to export these concatenated files for viewing in an external viewer), the plot becomes what I showed in my original post with all channels empty. We double checked in Matlab using [a,b] = edfread(‘../PR06_night_1.1_scalp_2023-12-06_20.59--2023-12-07_02.59.edf’), and the data is indeed all NaN’s except for one channel:

I’ve been looking everyone to try to determine what might be causing this, and I came across the following note on the Raw.export page: “Export to external format may not preserve all the information from the instance.” Is it possible that my file is too large to safely export, and do you have any ideas for diagnosing this?

Again I’m quite stuck because using the following code,

the exported file is broken, but then next line when we plot, we get valid data. Any help would be appreciated!

cbrnr · March 27, 2024, 7:31am

Indeed it looks like something could be wrong with our EDF export. Are you using the latest version of the edfio package (it should be 0.4.0)?

When you say your file might be too large, what dimensions are we talking about? What is the size of the data set you are exporting (number of channels and number of samples)? Also, can you show the output of res.info before the export line?

Another thing you could try is to export just a single file to rule out that the problem is with the concatenation step.

rzhao · March 27, 2024, 8:01pm

Hi, here are my attempts at your suggestions:

I’m not unsure if the previous code was ran with the correct version, but I’ve installed it now.

I have tried to export just a single 5 minute file rather than the concatenated file. The exported file works as expected. Let me know if I misinterpreted what you meant.

The size of my export is current: 20 channels, ~6 hours * 200 sampling freq ~= 4320000 samples. The output file size is 168,887 KB.

And finally, after install edfio 0.4.0 and rerunning my script on 6 hours, I get NaNs. Here is the res.info:

<Info | 10 non-empty values
 bads: []
 ch_names: Fp1_F7, F7_T7, T7_P7, P7_O1, Fp1_F3, F3_C3, C3_P3, P3_O1, Fz_Cz, ...
 chs: 20 EEG
 custom_ref_applied: True
 dig: 0 items
 highpass: 0.0 Hz
 lowpass: 100.0 Hz
 meas_date: 2023-12-06 20:59:58 UTC
 nchan: 20
 projs: []
 sfreq: 200.0 Hz
 subject_info: 5 items (dict)
>

As before, the plot produces valid data while the exported file produces NaN’s in Matlab.

As a recap of my algorithm:

Goal: Take continuous 5 minute chunk EDF’s and concatenate the into a single 6 hour EDF file.

Produce list of EDF file names.
Call read_raw on these, drop unneeded channels and decimate to 200hz
Call concatenate_raws on this list.
Perform a bipolar reference
mne.export_raw the result

thanks for the continued help, and please me know if you need any other information.

cbrnr · March 28, 2024, 7:56am

Thanks @rzhao! So exporting a single file works, that’s good to know. Do I remember correctly that you mentioned elsewhere (not sure where) that the exported file is fine up to a certain number of concatenated datasets? Just to be sure, can you try exporting two concatenated datasets? And to be clear, please include all subsequent steps that you use in your pipeline (drop unneeded channels, decimate to 200Hz, create bipolar derivations). The culprit could lie in any of these processing steps, so I would like to rule out these possibilities (which would leave only the exporting function).

The resulting file is not particularly large, I’ve seen bigger EDF files that work without any problems.

Once we know that the exported file looks fine for two (or possibly even more) concatenated datasets (but including all processing steps), I would next try to use edfio directly to export your data (edfio — edfio).

rzhao · March 28, 2024, 9:04pm

Yes, for a prior collection of patient data, I was able to concatenate and successfully export up to 110 5 minute files.

Now, I am working with another patient’s data and this 110 number does not seem to hold up.

Using the same code that previously failed (from my previous comment) and changing only the number of files per concatenation, I am able to successfully concatenate two EDF files, three, and also twelve (1 hour).
I haven’t done extensive testing to find an upper bound on when concatenation starts to fail, but as a recap of what’s going on: Concatenating and exporting 12 files succeeds. Using this code and changing nothing but the number of files to concatenate, exports start to fail around 24 files (2 hours) and beyond.
Yet, for all of these concatenated Raw objects, regardless of fail or success on export, produce valid plots using mne.plot.

Here’s my best shot at summarizing the processing steps, since I wouldn’t want to force you to read over all the code!

trimmed = [scalp_trim_and_decimate(to_edf(edf), 200) for edf in interval.files]

interval.files is a list of file names
to_edf calls and returns mne.io.read_raw_edf(edf_path) alongside some path i/o
scalp_trim_and_decimate calls rename_channels(rename_dict) and drop_channels(to_drop) on a list of channels, before finally calling resample to 200.

Finally, call mne.concatenate_raws(trimmed) and call set_bipolar_reference(raw_edf, anodes, cathodes, names) on the result. I drop some additional channels and finally call mne.export.export_raw(out_name, res, 'edf', overwrite=True). Finally I plot this Raw same object.

Oh and as for using edfio to directly export, I assume I’d want to use the write function, but how do I convert my mne.Raw into an edfio.EDF? Or would I have to rewrite all the EDF functions to use the other library

Thank you!

cbrnr · March 29, 2024, 9:25am

OK, two things come to mind that I’d like to check before we can be pretty sure it’s an issue related to EDF export.

Can you rule out that it is one particular file which causes the problem? It is conceivable that it is not the number, but a certain (defective?) file which causes all of the data to become NaN.
You can export your data directly with edfio based on the example at the bottom of their website:
```
import numpy as np

from edfio import Edf, EdfSignal

edf = Edf(
    [
        EdfSignal(np.random.randn(30 * 256), sampling_frequency=256, label="EEG Fpz"),
        EdfSignal(np.random.randn(30), sampling_frequency=1, label="Body Temp"),
    ]
)
edf.write("example.edf")
```
You need to create a list of EdfSignal objects corresponding to individual channels. You can get the data as a NumPy array using res.get_data() (which also accepts a picks argument to extract only specific channels). If you cannot figure out how to do it, I’m happy to help if you can share your data with me.

cbrnr · March 29, 2024, 3:35pm

I just realized that MNE 1.6.1 is still using edflib-python, which we’ve replaced with edfio in our current development branch. It will be part of the 1.7 release, which is due mid April, but if you don’t want to wait, feel free to install the dev version and let us know if this fixed the issue!

rzhao · April 1, 2024, 11:28am

thank you! Trying to slowly comb through the suggestions. This isn’t particularly related to any of them, but have you ever seen this particular issue? It seems to have to do with mneio but I’m not sure exactly how to fix it:

cbrnr · April 2, 2024, 5:26am

This error points to your physical minimum being extremely large (it looks like it is -3982730000µV), which cannot be represented in the corresponding header. Usually, this happens if you have no highpass filter applied to your data. I would strongly recommend to apply a highpass (e.g. 0.1Hz, 0.5Hz, or even 1Hz), unless you have a reason why you need to keep it. High-passing your data might even fix the original issue in MNE 1.6.1.

rzhao · April 18, 2024, 1:42am

Hi! I tried a highpass filter (raw.filter(lp=None, hp={suggested val’s)), alongside other highpass/lowpass combinations and I always receive the same error, sometimes with slightly different values. I different idea we had was to zero-mean each channel to negate outlier values. We tried to apply common average referencing to handle this, but same error. Is there a way to zero-mean channel wise? I couldn’t find a way to mutate the Raw object itself by channel or a corresponding method. Thanks as always

cbrnr · April 18, 2024, 5:02am

Since you are concatenating multiple EEG segments, I recommend that you zero-mean each segment individually. You can use raw.apply_function() in combination with scipy.signal.detrend(). The latter performs linear detrending, i.e. it removes a straight line fit from each channel. If you only want to remove the overall mean, you can set type="constant" (but I’d stick with the default type="linear"). After detrending, I’d also apply a high-pass filter to get rid of remaining low frequency noise.

from scipy.signal import detrend

raw.apply_function(detrend, channel_wise=False)

Concatenating these preprocessed EEG segments should result in an overall mean of zero across the entire time course (plus minus edge artifacts from high-pass filtering, so you might want to double check what the actual combined signal looks like; bear in mind that raw.plot() removes the offset of the visible time segment by default).

rzhao · May 13, 2024, 8:22am

Hi @cbrnr, thanks for your great suggestion. I applied a bandpass filter alongside the detrend function and the mne development version, and unfortunately I face the same error. A workaround that did get me a result (i.e. a file that’s not all NaN’s) was bounding the physical_range when calling mne.export.export_raw (I arbitrarily chose [-100, 100], and will later experiment with larger values).

So then I was curious and I wanted to locate the abnormally large value (in my case -69964600), so for every single file to be concatenated and the final concatenated file, I ran an extra for loop that computed the min and max voltage for each channel. However, none of the channels contained a value even remotely close to -69964600 and were at most ~-6000. The code I used follows. I was wondering if you have any idea for how I can attempt to identify where this mystery value that appears to be messing up my output is coming from? As it stands, my solution is to try to calculate the global min and max voltage values for all channels, and then set those as the physical_range bounds on export, but I really want to understand the error itself. Any help would be appreciated!

for i, edf in enumerate(interval.files): # a list of str's, e.g. ['ex-name-1.edf', 'ex-name-2.edf']
    trimmed = scalp_trim_and_decimate(to_edf(edf), 200) # to_edf calls read_raw_edf, and then I resample and drop unwanted channels.
    demeaned = trimmed.apply_function(detrend, channel_wise=True, type="constant")
    filtered = demeaned.filter(l_freq=0.5, h_freq=58)
    dropped = filtered.drop_channels(['L EMG-Ref', 'R EMG-Ref'])
    edfs.append(dropped) # my list of edfs I later concatenate

    # Here's where I try to locate -69964600 to no success
    diagnostics = []
    data = dropped.get_data()
    for channel in data:
        diagnostics.append(max(channel), min(channel), np.isin(np.array([-69964600]), channel))
    write_txt( # function that writes to a text file
        f'edf-{i} Data:\n',
        f'Info{str(dropped.info)}',
        f'Dim: {data.shape[0]} x {data.shape[1]}\n',
        f'Data: {str(data)}\n'
        f'Diagnostics: {diagnostics}')
# same loop is later ran for concatenated output file

Topic		Replies	Views
Problems in concatenating EEGs Support & Discussions	0	70	June 10, 2024
data lost when exporting to .edf Support & Discussions eeg	1	251	August 16, 2023
Bugs in down sampling Support & Discussions preprocessing , eeg	2	152	January 10, 2024
Is there a way to concatenate RawEDF with RawArray? Support & Discussions	13	89	May 29, 2025
Strange EEG data in EDF format Support & Discussions	2	161	April 24, 2024

concatenate_raws produces EDFs files with no output

Related topics