Data size not consistent with its' true size

BarryLiu97 · August 9, 2021, 11:08am

Hi, I have 2 GB of EEG data, after loading it using MNE, MNE shows that the size of the data is about 8GB. How so?

BarryLiu97 · August 9, 2021, 11:12am

The sampling rate of my data is 1000, after resampling to 500, the size of the data should be 1GB, but it’s still 2GB after raw.save(’**.fif’)

marsipu · August 9, 2021, 12:32pm

Hello @BarryLiu97,

I had a similar problem, when using raw.save() with my data:

It turned out, that the parameter fmt was crucial, as its default doubled my data in size. That could be solved with setting fmt=raw.orig_format.
Does this solve your problem too?

BarryLiu97 · August 9, 2021, 1:33pm

Hi, thank you for your help. The size of my data is 2GB, but the space used after reading the data is 8 GB.
seeg is the raw data I read from EDF file using preload=True, its sampling rate is 2000Hz
resample_seeg is downsampled to 1000 Hz
The size of them are 4x

The orig_format of my data is int

marsipu · August 9, 2021, 2:04pm

Hi @BarryLiu97,
I understand, you seem to have a different problem than I had. Mine was that size of a file doubled after it was saved to disk again, you wonder why the space on RAM occupied by the preloaded Raw-object is bigger than the space on drive-memory.
I just tried loading some of my neuromag-data (orig_format=short) and the RAM-space occupied was ~4x the size it takes on the drive too.
I frankly don’t know the reason and if that’s the default.

Maybe someone else can help and explain this?

BarryLiu97 · August 9, 2021, 2:08pm

This is pretty disturbing. I resampled data in order to make the data smaller, while it doesn’t change when saving it even if I downsampled it from 2000 to 1000.

marsipu · August 9, 2021, 2:29pm

Do you mean you loaded your data, resampled from 2000 Hz to 1000 Hz, saved again and then the drive-space occupied by the saved file was the same as from the loaded file?

Have you tried raw.save(fmt=raw.orig_format)?

BarryLiu97 · August 9, 2021, 2:31pm

Yes, I tried. Here is the thing, I have 2 GB of data, when loading it, it’s 8 GB. After downsampling, it’s 4GB. And when I saved it, it becomes 2GB.

marsipu · August 9, 2021, 2:37pm

I see, that I can’t reproduce with my data. I am sorry, I don’t know how to help there. Hopefully somebody else does.

BarryLiu97 · August 9, 2021, 3:02pm

Here is my data.DA7450EM.edf - Google Drive

import mne
fpath = 'DA7450EM.edf'
raw = mne.io.read_raw_edf(fpath, preload=True)
print(raw)
resample_raw = raw.copy().resample(1000, n_jobs='cuda')
print(resample_raw)

drammock · August 9, 2021, 7:41pm

MNE-Python always uses 64-bit numbers when data objects are in-memory (Have a look at np.finfo(np.float32).resolution versus np.finfo(np.float64).resolution to see why this is… MEG data are often on the order of 10e-15 or so, and variances can be even smaller). Saving data in .fif format will save as 32-bit numbers, however, because that’s how the FIF standard was defined. So if your original file was 2 GB at 16-bit, it is expected to become 4x bigger when loaded, and then reduced to 2x by your downsampling, and finally back to 1x when saving.

If you need to save space, ask yourself how good of temporal resolution do you need, and how high of frequencies to you intend to analyze. 1000 Hz sampling frequency → 1ms temporal resolution and 500 Hz max frequency… depending on your analysis plans you might be able to get by with less.

BarryLiu97 · August 10, 2021, 6:51am

Thank you so much for your answer!

agramfort · August 14, 2021, 9:04am

It would generate more rounding numerical errors. Depending on what you do with the data it can be a problem or not.

Alex

Topic		Replies	Views
FIF-File 2x disk-size after first loading Support & Discussions	5	238	March 21, 2021
fif.gz compression less effective after filtering? Support & Discussions	11	533	December 14, 2021
Preloading raw array too big Mailing List Archive (read-only) list-archive	2	166	May 2, 2014
Strange EEG data in EDF format Support & Discussions	2	155	April 24, 2024
Very large amplitude values of EEG while plotting Support & Discussions eeg	9	95	September 17, 2024

Data size not consistent with its' true size

Related topics