Data size not consistent with its' true size

Hi, I have 2 GB of EEG data, after loading it using MNE, MNE shows that the size of the data is about 8GB. How so?

The sampling rate of my data is 1000, after resampling to 500, the size of the data should be 1GB, but it’s still 2GB after raw.save(’**.fif’)

Hello @BarryLiu97,

I had a similar problem, when using raw.save() with my data:

It turned out, that the parameter fmt was crucial, as its default doubled my data in size. That could be solved with setting fmt=raw.orig_format.
Does this solve your problem too?

Hi, thank you for your help. The size of my data is 2GB, but the space used after reading the data is 8 GB.
seeg is the raw data I read from EDF file using preload=True, its sampling rate is 2000Hz
resample_seeg is downsampled to 1000 Hz
The size of them are 4x


The orig_format of my data is int

Hi @BarryLiu97,
I understand, you seem to have a different problem than I had. Mine was that size of a file doubled after it was saved to disk again, you wonder why the space on RAM occupied by the preloaded Raw-object is bigger than the space on drive-memory.
I just tried loading some of my neuromag-data (orig_format=short) and the RAM-space occupied was ~4x the size it takes on the drive too.
I frankly don’t know the reason and if that’s the default.

Maybe someone else can help and explain this?

This is pretty disturbing. I resampled data in order to make the data smaller, while it doesn’t change when saving it even if I downsampled it from 2000 to 1000.

Do you mean you loaded your data, resampled from 2000 Hz to 1000 Hz, saved again and then the drive-space occupied by the saved file was the same as from the loaded file?

Have you tried raw.save(fmt=raw.orig_format)?

Yes, I tried. Here is the thing, I have 2 GB of data, when loading it, it’s 8 GB. After downsampling, it’s 4GB. And when I saved it, it becomes 2GB.

I see, that I can’t reproduce with my data. I am sorry, I don’t know how to help there. Hopefully somebody else does.

Here is my data.DA7450EM.edf - Google Drive

import mne
fpath = 'DA7450EM.edf'
raw = mne.io.read_raw_edf(fpath, preload=True)
print(raw)
resample_raw = raw.copy().resample(1000, n_jobs='cuda')
print(resample_raw)

MNE-Python always uses 64-bit numbers when data objects are in-memory (Have a look at np.finfo(np.float32).resolution versus np.finfo(np.float64).resolution to see why this is… MEG data are often on the order of 10e-15 or so, and variances can be even smaller). Saving data in .fif format will save as 32-bit numbers, however, because that’s how the FIF standard was defined. So if your original file was 2 GB at 16-bit, it is expected to become 4x bigger when loaded, and then reduced to 2x by your downsampling, and finally back to 1x when saving.

If you need to save space, ask yourself how good of temporal resolution do you need, and how high of frequencies to you intend to analyze. 1000 Hz sampling frequency → 1ms temporal resolution and 500 Hz max frequency… depending on your analysis plans you might be able to get by with less.

4 Likes

Thank you so much for your answer!

It would generate more rounding numerical errors. Depending on what you do with the data it can be a problem or not.

Alex

1 Like