operating system: windows 10
hi! Sorry Iām very new here and want to write a script that concatenates a bunch of 5 min long continuous EDF files with EEG data over 11 hours at a time (i.e. 121 total files)
My code essentially looks like the followingā¦
trimmed = [trim_and_decimate(to_edf(edf), 200) for edf in interval.files]
res = mne.concatenate_raws(trimmed)
mne.export.export_raw(<name>, mne.set_bipolar_reference(<res>, ..., ..., ...)), 'edf')
trim_and_decimate drops unwanted channels and downsamples the frequency
to_edf simply returns the result of mne.io.read_raw_edf(<edf_path>)
iāve able to succeed at concatenating up to 110 files (~120,000 Kb file) until I just get no output at all (even though if I print the info of the Raw object prior to exporting, there do seem to be values.)
Any idea of where things might be going wrong? I am unsure why concatenation is working just fine until I hit this threshold. Any help would be appreciated!
Hi, thank you for spending the time to trial! I tried the plot function and got something that looks like this. Iāve tried to use +/- to scale the visual but it seems to always appear as a massive block. Do you have any advice for making the signals more viewer friendly? Is this what your output looked like?
Thank you so much! It may very well be the third party plotting program I was using. The built-in MNE plotting seems great. May I ask what the difference is between raw.plot() and mne.viz.plot_raw()? Also is there a way to save down the interactive plot - is it so plt.savefig()?
Sorry I should have clarified. Is there any way to save down the .plt interactive plot rather than just a png? I looked into savefig, and it appears the available formats are only #savefig.format: png # {png, ps, pdf, svg}?
However, upon exporting the raw file with the following function (our team wants to export these concatenated files for viewing in an external viewer), the plot becomes what I showed in my original post with all channels empty. We double checked in Matlab using [a,b] = edfread(ā../PR06_night_1.1_scalp_2023-12-06_20.59--2023-12-07_02.59.edfā), and the data is indeed all NaNās except for one channel:
Iāve been looking everyone to try to determine what might be causing this, and I came across the following note on the Raw.export page: āExport to external format may not preserve all the information from the instance.ā Is it possible that my file is too large to safely export, and do you have any ideas for diagnosing this?
Again Iām quite stuck because using the following code,
Indeed it looks like something could be wrong with our EDF export. Are you using the latest version of the edfio package (it should be 0.4.0)?
When you say your file might be too large, what dimensions are we talking about? What is the size of the data set you are exporting (number of channels and number of samples)? Also, can you show the output of res.info before the export line?
Another thing you could try is to export just a single file to rule out that the problem is with the concatenation step.
Iām not unsure if the previous code was ran with the correct version, but Iāve installed it now.
I have tried to export just a single 5 minute file rather than the concatenated file. The exported file works as expected. Let me know if I misinterpreted what you meant.
The size of my export is current: 20 channels, ~6 hours * 200 sampling freq ~= 4320000 samples. The output file size is 168,887 KB.
And finally, after install edfio 0.4.0 and rerunning my script on 6 hours, I get NaNs. Here is the res.info:
Thanks @rzhao! So exporting a single file works, thatās good to know. Do I remember correctly that you mentioned elsewhere (not sure where) that the exported file is fine up to a certain number of concatenated datasets? Just to be sure, can you try exporting two concatenated datasets? And to be clear, please include all subsequent steps that you use in your pipeline (drop unneeded channels, decimate to 200Hz, create bipolar derivations). The culprit could lie in any of these processing steps, so I would like to rule out these possibilities (which would leave only the exporting function).
The resulting file is not particularly large, Iāve seen bigger EDF files that work without any problems.
Once we know that the exported file looks fine for two (or possibly even more) concatenated datasets (but including all processing steps), I would next try to use edfio directly to export your data (edfio ā edfio).
Yes, for a prior collection of patient data, I was able to concatenate and successfully export up to 110 5 minute files.
Now, I am working with another patientās data and this 110 number does not seem to hold up.
Using the same code that previously failed (from my previous comment) and changing only the number of files per concatenation, I am able to successfully concatenate two EDF files, three, and also twelve (1 hour).
I havenāt done extensive testing to find an upper bound on when concatenation starts to fail, but as a recap of whatās going on: Concatenating and exporting 12 files succeeds. Using this code and changing nothing but the number of files to concatenate, exports start to fail around 24 files (2 hours) and beyond.
Yet, for all of these concatenated Raw objects, regardless of fail or success on export, produce valid plots using mne.plot.
Hereās my best shot at summarizing the processing steps, since I wouldnāt want to force you to read over all the code!
trimmed = [scalp_trim_and_decimate(to_edf(edf), 200) for edf in interval.files]
interval.files is a list of file names
to_edf calls and returns mne.io.read_raw_edf(edf_path) alongside some path i/o
scalp_trim_and_decimate calls rename_channels(rename_dict) and drop_channels(to_drop) on a list of channels, before finally calling resample to 200.
Finally, call mne.concatenate_raws(trimmed) and call set_bipolar_reference(raw_edf, anodes, cathodes, names) on the result. I drop some additional channels and finally call mne.export.export_raw(out_name, res, 'edf', overwrite=True). Finally I plot this Raw same object.
Oh and as for using edfio to directly export, I assume Iād want to use the write function, but how do I convert my mne.Raw into an edfio.EDF? Or would I have to rewrite all the EDF functions to use the other library
OK, two things come to mind that Iād like to check before we can be pretty sure itās an issue related to EDF export.
Can you rule out that it is one particular file which causes the problem? It is conceivable that it is not the number, but a certain (defective?) file which causes all of the data to become NaN.
You can export your data directly with edfio based on the example at the bottom of their website:
You need to create a list of EdfSignal objects corresponding to individual channels. You can get the data as a NumPy array using res.get_data() (which also accepts a picks argument to extract only specific channels). If you cannot figure out how to do it, Iām happy to help if you can share your data with me.
I just realized that MNE 1.6.1 is still using edflib-python, which weāve replaced with edfio in our current development branch. It will be part of the 1.7 release, which is due mid April, but if you donāt want to wait, feel free to install the dev version and let us know if this fixed the issue!
thank you! Trying to slowly comb through the suggestions. This isnāt particularly related to any of them, but have you ever seen this particular issue? It seems to have to do with mneio but Iām not sure exactly how to fix it:
This error points to your physical minimum being extremely large (it looks like it is -3982730000ĀµV), which cannot be represented in the corresponding header. Usually, this happens if you have no highpass filter applied to your data. I would strongly recommend to apply a highpass (e.g. 0.1Hz, 0.5Hz, or even 1Hz), unless you have a reason why you need to keep it. High-passing your data might even fix the original issue in MNE 1.6.1.
Hi! I tried a highpass filter (raw.filter(lp=None, hp={suggested valās)), alongside other highpass/lowpass combinations and I always receive the same error, sometimes with slightly different values. I different idea we had was to zero-mean each channel to negate outlier values. We tried to apply common average referencing to handle this, but same error. Is there a way to zero-mean channel wise? I couldnāt find a way to mutate the Raw object itself by channel or a corresponding method. Thanks as always
Since you are concatenating multiple EEG segments, I recommend that you zero-mean each segment individually. You can use raw.apply_function() in combination with scipy.signal.detrend(). The latter performs linear detrending, i.e. it removes a straight line fit from each channel. If you only want to remove the overall mean, you can set type="constant" (but Iād stick with the default type="linear"). After detrending, Iād also apply a high-pass filter to get rid of remaining low frequency noise.
from scipy.signal import detrend
raw.apply_function(detrend, channel_wise=False)
Concatenating these preprocessed EEG segments should result in an overall mean of zero across the entire time course (plus minus edge artifacts from high-pass filtering, so you might want to double check what the actual combined signal looks like; bear in mind that raw.plot() removes the offset of the visible time segment by default).
Hi @cbrnr, thanks for your great suggestion. I applied a bandpass filter alongside the detrend function and the mne development version, and unfortunately I face the same error. A workaround that did get me a result (i.e. a file thatās not all NaNās) was bounding the physical_range when calling mne.export.export_raw (I arbitrarily chose [-100, 100], and will later experiment with larger values).
So then I was curious and I wanted to locate the abnormally large value (in my case -69964600), so for every single file to be concatenated and the final concatenated file, I ran an extra for loop that computed the min and max voltage for each channel. However, none of the channels contained a value even remotely close to -69964600 and were at most ~-6000. The code I used follows. I was wondering if you have any idea for how I can attempt to identify where this mystery value that appears to be messing up my output is coming from? As it stands, my solution is to try to calculate the global min and max voltage values for all channels, and then set those as the physical_range bounds on export, but I really want to understand the error itself. Any help would be appreciated!
for i, edf in enumerate(interval.files): # a list of str's, e.g. ['ex-name-1.edf', 'ex-name-2.edf']
trimmed = scalp_trim_and_decimate(to_edf(edf), 200) # to_edf calls read_raw_edf, and then I resample and drop unwanted channels.
demeaned = trimmed.apply_function(detrend, channel_wise=True, type="constant")
filtered = demeaned.filter(l_freq=0.5, h_freq=58)
dropped = filtered.drop_channels(['L EMG-Ref', 'R EMG-Ref'])
edfs.append(dropped) # my list of edfs I later concatenate
# Here's where I try to locate -69964600 to no success
diagnostics = []
data = dropped.get_data()
for channel in data:
diagnostics.append(max(channel), min(channel), np.isin(np.array([-69964600]), channel))
write_txt( # function that writes to a text file
f'edf-{i} Data:\n',
f'Info{str(dropped.info)}',
f'Dim: {data.shape[0]} x {data.shape[1]}\n',
f'Data: {str(data)}\n'
f'Diagnostics: {diagnostics}')
# same loop is later ran for concatenated output file