concatenate_raws produces EDFs files with no output

  • MNE version: e.g. 1.6.1
  • operating system: windows 10
    hi! Sorry Iā€™m very new here and want to write a script that concatenates a bunch of 5 min long continuous EDF files with EEG data over 11 hours at a time (i.e. 121 total files)

My code essentially looks like the followingā€¦

trimmed = [trim_and_decimate(to_edf(edf), 200) for edf in interval.files]
res = mne.concatenate_raws(trimmed)
mne.export.export_raw(<name>, mne.set_bipolar_reference(<res>, ..., ..., ...)), 'edf')
  • trim_and_decimate drops unwanted channels and downsamples the frequency
  • to_edf simply returns the result of mne.io.read_raw_edf(<edf_path>)

iā€™ve able to succeed at concatenating up to 110 files (~120,000 Kb file) until I just get no output at all (even though if I print the info of the Raw object prior to exporting, there do seem to be values.)

Any idea of where things might be going wrong? I am unsure why concatenation is working just fine until I hit this threshold. Any help would be appreciated!

I tried this with the following example (which concatenates 1000 Raw objects, each 5 minutes long), and it works:

from numpy.random import default_rng

from mne import concatenate_raws, create_info
from mne.io import RawArray


def create_toy_data(n_channels=16, duration=300, sfreq=250, seed=None):
    """Create toy data."""
    rng = default_rng(seed)
    data = rng.standard_normal(size=(n_channels, duration * sfreq)) * 5e-6
    info = create_info(n_channels, sfreq, "eeg")
    return RawArray(data, info)


raws = [create_toy_data() for _ in range(1000)]
raw = concatenate_raws(raws)
raw.plot(block=True)

Can you try if this works for you?

Also, the screenshot you are showing is not the MNE browser, so it might be a problem with this app.

Hi, thank you for spending the time to trial! I tried the plot function and got something that looks like this. Iā€™ve tried to use +/- to scale the visual but it seems to always appear as a massive block. Do you have any advice for making the signals more viewer friendly? Is this what your output looked like?

Your signals are probably scaled incorrectly, please see How to scale properly my plot? - #7 by mscheltienne.

1 Like

Thank you so much! It may very well be the third party plotting program I was using. The built-in MNE plotting seems great. May I ask what the difference is between raw.plot() and mne.viz.plot_raw()? Also is there a way to save down the interactive plot - is it so plt.savefig()?

Thereā€™s no difference, raw.plot() internally calls mne.viz.plot_raw() (so in general you should pretty much always use raw.plot()).

Yes, raw.plot() returns the Figure object, which you can then use for further modifications or saving.

Sorry I should have clarified. Is there any way to save down the .plt interactive plot rather than just a png? I looked into savefig, and it appears the available formats are only #savefig.format: png # {png, ps, pdf, svg}?

Why would you want to do that? You need to open the figure in Python anyway, so you might as well just use your script to reproduce the figure.

If you really do need to save the figure object to disk, you can pickle.dump() the Figure object.

Hi Mr. Brunner again! Weā€™ve been able to successfully produce plots upon concatenating our files together, as seen below (thanks for all the help!)

However, upon exporting the raw file with the following function (our team wants to export these concatenated files for viewing in an external viewer), the plot becomes what I showed in my original post with all channels empty. We double checked in Matlab using [a,b] = edfread(ā€˜../PR06_night_1.1_scalp_2023-12-06_20.59--2023-12-07_02.59.edfā€™), and the data is indeed all NaNā€™s except for one channel:

Iā€™ve been looking everyone to try to determine what might be causing this, and I came across the following note on the Raw.export page: ā€œExport to external format may not preserve all the information from the instance.ā€ Is it possible that my file is too large to safely export, and do you have any ideas for diagnosing this?

Again Iā€™m quite stuck because using the following code,


the exported file is broken, but then next line when we plot, we get valid data. Any help would be appreciated!

Indeed it looks like something could be wrong with our EDF export. Are you using the latest version of the edfio package (it should be 0.4.0)?

When you say your file might be too large, what dimensions are we talking about? What is the size of the data set you are exporting (number of channels and number of samples)? Also, can you show the output of res.info before the export line?

Another thing you could try is to export just a single file to rule out that the problem is with the concatenation step.

1 Like

Hi, here are my attempts at your suggestions:

Iā€™m not unsure if the previous code was ran with the correct version, but Iā€™ve installed it now.
image

I have tried to export just a single 5 minute file rather than the concatenated file. The exported file works as expected. Let me know if I misinterpreted what you meant.

The size of my export is current: 20 channels, ~6 hours * 200 sampling freq ~= 4320000 samples. The output file size is 168,887 KB.

And finally, after install edfio 0.4.0 and rerunning my script on 6 hours, I get NaNs. Here is the res.info:

<Info | 10 non-empty values
 bads: []
 ch_names: Fp1_F7, F7_T7, T7_P7, P7_O1, Fp1_F3, F3_C3, C3_P3, P3_O1, Fz_Cz, ...
 chs: 20 EEG
 custom_ref_applied: True
 dig: 0 items
 highpass: 0.0 Hz
 lowpass: 100.0 Hz
 meas_date: 2023-12-06 20:59:58 UTC
 nchan: 20
 projs: []
 sfreq: 200.0 Hz
 subject_info: 5 items (dict)
>
  • As before, the plot produces valid data while the exported file produces NaNā€™s in Matlab.

As a recap of my algorithm:

  • Goal: Take continuous 5 minute chunk EDFā€™s and concatenate the into a single 6 hour EDF file.
  1. Produce list of EDF file names.
  2. Call read_raw on these, drop unneeded channels and decimate to 200hz
  3. Call concatenate_raws on this list.
  4. Perform a bipolar reference
  5. mne.export_raw the result

thanks for the continued help, and please me know if you need any other information.

Thanks @rzhao! So exporting a single file works, thatā€™s good to know. Do I remember correctly that you mentioned elsewhere (not sure where) that the exported file is fine up to a certain number of concatenated datasets? Just to be sure, can you try exporting two concatenated datasets? And to be clear, please include all subsequent steps that you use in your pipeline (drop unneeded channels, decimate to 200Hz, create bipolar derivations). The culprit could lie in any of these processing steps, so I would like to rule out these possibilities (which would leave only the exporting function).

The resulting file is not particularly large, Iā€™ve seen bigger EDF files that work without any problems.

Once we know that the exported file looks fine for two (or possibly even more) concatenated datasets (but including all processing steps), I would next try to use edfio directly to export your data (edfio ā€” edfio).

Yes, for a prior collection of patient data, I was able to concatenate and successfully export up to 110 5 minute files.

Now, I am working with another patientā€™s data and this 110 number does not seem to hold up.

  • Using the same code that previously failed (from my previous comment) and changing only the number of files per concatenation, I am able to successfully concatenate two EDF files, three, and also twelve (1 hour).
  • I havenā€™t done extensive testing to find an upper bound on when concatenation starts to fail, but as a recap of whatā€™s going on: Concatenating and exporting 12 files succeeds. Using this code and changing nothing but the number of files to concatenate, exports start to fail around 24 files (2 hours) and beyond.
  • Yet, for all of these concatenated Raw objects, regardless of fail or success on export, produce valid plots using mne.plot.

Hereā€™s my best shot at summarizing the processing steps, since I wouldnā€™t want to force you to read over all the code!

trimmed = [scalp_trim_and_decimate(to_edf(edf), 200) for edf in interval.files]

  • interval.files is a list of file names
  • to_edf calls and returns mne.io.read_raw_edf(edf_path) alongside some path i/o
  • scalp_trim_and_decimate calls rename_channels(rename_dict) and drop_channels(to_drop) on a list of channels, before finally calling resample to 200.

Finally, call mne.concatenate_raws(trimmed) and call set_bipolar_reference(raw_edf, anodes, cathodes, names) on the result. I drop some additional channels and finally call mne.export.export_raw(out_name, res, 'edf', overwrite=True). Finally I plot this Raw same object.

Oh and as for using edfio to directly export, I assume Iā€™d want to use the write function, but how do I convert my mne.Raw into an edfio.EDF? Or would I have to rewrite all the EDF functions to use the other library

Thank you!

OK, two things come to mind that Iā€™d like to check before we can be pretty sure itā€™s an issue related to EDF export.

  1. Can you rule out that it is one particular file which causes the problem? It is conceivable that it is not the number, but a certain (defective?) file which causes all of the data to become NaN.

  2. You can export your data directly with edfio based on the example at the bottom of their website:

    import numpy as np
    
    from edfio import Edf, EdfSignal
    
    edf = Edf(
        [
            EdfSignal(np.random.randn(30 * 256), sampling_frequency=256, label="EEG Fpz"),
            EdfSignal(np.random.randn(30), sampling_frequency=1, label="Body Temp"),
        ]
    )
    edf.write("example.edf")
    

    You need to create a list of EdfSignal objects corresponding to individual channels. You can get the data as a NumPy array using res.get_data() (which also accepts a picks argument to extract only specific channels). If you cannot figure out how to do it, Iā€™m happy to help if you can share your data with me.

I just realized that MNE 1.6.1 is still using edflib-python, which weā€™ve replaced with edfio in our current development branch. It will be part of the 1.7 release, which is due mid April, but if you donā€™t want to wait, feel free to install the dev version and let us know if this fixed the issue!

thank you! Trying to slowly comb through the suggestions. This isnā€™t particularly related to any of them, but have you ever seen this particular issue? It seems to have to do with mneio but Iā€™m not sure exactly how to fix it:

This error points to your physical minimum being extremely large (it looks like it is -3982730000ĀµV), which cannot be represented in the corresponding header. Usually, this happens if you have no highpass filter applied to your data. I would strongly recommend to apply a highpass (e.g. 0.1Hz, 0.5Hz, or even 1Hz), unless you have a reason why you need to keep it. High-passing your data might even fix the original issue in MNE 1.6.1.

Hi! I tried a highpass filter (raw.filter(lp=None, hp={suggested valā€™s)), alongside other highpass/lowpass combinations and I always receive the same error, sometimes with slightly different values. I different idea we had was to zero-mean each channel to negate outlier values. We tried to apply common average referencing to handle this, but same error. Is there a way to zero-mean channel wise? I couldnā€™t find a way to mutate the Raw object itself by channel or a corresponding method. Thanks as always

Since you are concatenating multiple EEG segments, I recommend that you zero-mean each segment individually. You can use raw.apply_function() in combination with scipy.signal.detrend(). The latter performs linear detrending, i.e. it removes a straight line fit from each channel. If you only want to remove the overall mean, you can set type="constant" (but Iā€™d stick with the default type="linear"). After detrending, Iā€™d also apply a high-pass filter to get rid of remaining low frequency noise.

from scipy.signal import detrend

raw.apply_function(detrend, channel_wise=False)

Concatenating these preprocessed EEG segments should result in an overall mean of zero across the entire time course (plus minus edge artifacts from high-pass filtering, so you might want to double check what the actual combined signal looks like; bear in mind that raw.plot() removes the offset of the visible time segment by default).

Hi @cbrnr, thanks for your great suggestion. I applied a bandpass filter alongside the detrend function and the mne development version, and unfortunately I face the same error. A workaround that did get me a result (i.e. a file thatā€™s not all NaNā€™s) was bounding the physical_range when calling mne.export.export_raw (I arbitrarily chose [-100, 100], and will later experiment with larger values).

So then I was curious and I wanted to locate the abnormally large value (in my case -69964600), so for every single file to be concatenated and the final concatenated file, I ran an extra for loop that computed the min and max voltage for each channel. However, none of the channels contained a value even remotely close to -69964600 and were at most ~-6000. The code I used follows. I was wondering if you have any idea for how I can attempt to identify where this mystery value that appears to be messing up my output is coming from? As it stands, my solution is to try to calculate the global min and max voltage values for all channels, and then set those as the physical_range bounds on export, but I really want to understand the error itself. Any help would be appreciated!

for i, edf in enumerate(interval.files): # a list of str's, e.g. ['ex-name-1.edf', 'ex-name-2.edf']
    trimmed = scalp_trim_and_decimate(to_edf(edf), 200) # to_edf calls read_raw_edf, and then I resample and drop unwanted channels.
    demeaned = trimmed.apply_function(detrend, channel_wise=True, type="constant")
    filtered = demeaned.filter(l_freq=0.5, h_freq=58)
    dropped = filtered.drop_channels(['L EMG-Ref', 'R EMG-Ref'])
    edfs.append(dropped) # my list of edfs I later concatenate

    # Here's where I try to locate -69964600 to no success
    diagnostics = []
    data = dropped.get_data()
    for channel in data:
        diagnostics.append(max(channel), min(channel), np.isin(np.array([-69964600]), channel))
    write_txt( # function that writes to a text file
        f'edf-{i} Data:\n',
        f'Info{str(dropped.info)}',
        f'Dim: {data.shape[0]} x {data.shape[1]}\n',
        f'Data: {str(data)}\n'
        f'Diagnostics: {diagnostics}')
# same loop is later ran for concatenated output file