Filtering and ICA memory issues

Hi all,

I would like to hear what people do to filter and run ICA and if there
is any advise.

We usually have around an hour of recording which gives ~4.5 to 5GB of
raw fiff files. First filtering and then running ICA in MNE-python
requires a lot of memory, sometimes as much as 50GB. So, I fairly often
get a memory error.

I would prefer not to downsample at this stage in the process. So, I
kindly ask if anybody has any thoughts and/or practises to avoid very
heavy memory use.

best wishes,
mads

Hi Mads,

Which version of sklearn are you using?
Do you use the decim parameter for ICA?
How do axactly do you use ICA?
50GB of memory is unexpected, it would mean that you make up to 10 copies
of your data.

Hi,

I use sklearn 0.17 (from anaconda). I have tried to the
"decim" param. I remember it as being "3" for data with 1000Hz sfreq.
But it didn't help much.

I have attach a script to show how I used it.

cheers,
mads

Hi Mads,

Which version of sklearn are you using?
Do you use the decim parameter for ICA?
How do axactly do you use ICA?
50GB of memory is unexpected, it would mean that you make up to 10
copies of your data.

    Hi all,

    I would like to hear what people do to filter and run ICA and if there
    is any advise.

    We usually have around an hour of recording which gives ~4.5 to 5GB of
    raw fiff files. First filtering and then running ICA in MNE-python
    requires a lot of memory, sometimes as much as 50GB. So, I fairly often
    get a memory error.

    I would prefer not to downsample at this stage in the process. So, I
    kindly ask if anybody has any thoughts and/or practises to avoid very
    heavy memory use.

    best wishes,
    mads
    _______________________________________________
    Mne_analysis mailing list
    Mne_analysis at nmr.mgh.harvard.edu
    <mailto:Mne_analysis at nmr.mgh.harvard.edu>
    https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis

    The information in this e-mail is intended only for the person to
    whom it is
    addressed. If you believe this e-mail was sent to you in error and
    the e-mail
    contains patient information, please contact the Partners Compliance
    HelpLine at
    http://www.partners.org/complianceline . If the e-mail was sent to
    you in error
    but does not contain patient information, please contact the sender
    and properly
    dispose of the e-mail.

_______________________________________________
Mne_analysis mailing list
Mne_analysis at nmr.mgh.harvard.edu
https://mail.nmr.mgh.harvard.edu/mailman/listinfo/mne_analysis

The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ica_test.py
Type: text/x-python
Size: 4840 bytes
Desc: not available
Url : http://mail.nmr.mgh.harvard.edu/pipermail/mne_analysis/attachments/20151125/ccfbe99d/attachment.py

Hi Mads,

it seems you don't use the decim parameter, do you?
Two things that I see immediately:
It should in fact save a lot. With 1000HZ you can decimate even more, for
ECG/EOG your sampling frequency should not be lower than 50 or so.
Second as you have 1 hour of data the ecg_epochs will be huge, assuming you
find many events.
Very often only a few are necessary to do the detection. Have you tried
picking the 100-200 first events?

We'll have a closer look soon.
Denis

Hi Dennis,

In this particular script you are right that I didn't use the decim
parameter. I just tried it playing around a bit in a previous version.

I have combined MEG & EEG so even a decim = 10 could work then.

I haven't tried picking only the first 100-200 ecg events. I'll try that.

cheers,
mads

On top of that you can delete the epochs objects once they did their job.
Also take a look at my repo how I'm using the MNE-Python ICA.

https://github.com/dengemann/meeg-preprocessing/blob/master/examples/plot_preprocess_filter_ica.py

and

https://github.com/dengemann/meeg-preprocessing/blob/master/meeg_preprocessing/preprocessing.py#L109

It's basically the pimped example inside a function + reporting
functionality.

Hi Denis,

Thanks for you advice (I hadn?t considered the number of ECG epochs!). I would like to follow up on Mads? original post regarding the large memory-footprint.

An hour of MEG+EEG at 1kHz adds up to about 5GB of data saved as 32-bit floats. However, preloading raw data (regardless of origin) forces a cast to float-64 (10 GB). This makes sense: preloading raw data means you are going to want to filter it, which requires the bit-depth to work.

However, reading all 400+ channels into memory does not make sense from an implementation perspective, since (most of) the relevant operations are channel-wise (certainly filtering). A much more efficient strategy would be to support block-reads of channel subsets, probably combined with np.memmap:ing. I can see some almost-implementations of memmap in the code, but the standard API does not expose them (for simplicity, understandably). Nor am I completely certain a memmap is helpful here...

There?s been some chatter on related topic on github (where this discussion also belongs), but I wonder if there?s any developments in this direction?

https://github.com/mne-tools/mne-python/issues/1766

/Chris

On top of that you can delete the epochs objects once they did their job.
Also take a look at my repo how I'm using the MNE-Python ICA.

https://github.com/dengemann/meeg-preprocessing/blob/master/examples/plot_preprocess_filter_ica.py

and

https://github.com/dengemann/meeg-preprocessing/blob/master/meeg_preprocessing/preprocessing.py#L109

It's basically the pimped example inside a function + reporting functionality.

Hi Dennis,

In this particular script you are right that I didn't use the decim
parameter. I just tried it playing around a bit in a previous version.

I have combined MEG & EEG so even a decim = 10 could work then.

I haven't tried picking only the first 100-200 ecg events. I'll try that.

cheers,
mads

hi Chris,

memory usage is certainly a topic where we can progress.

For memmap just pass a string to preload parameter. I wrote this
years ago, but I admit I almost never used it. So not sure how much
it can actually help.

if you can share a script that we can use for memory monitoring eg on
sample data
that would be great.

we'll see what we can do.

Best,
Alex

Hi Alex, I?ll move this to github, with an example script using memory_profiler. My initial experiments some time ago fizzled when I got the feeling I was thinking about memmapping the wrong way (and that it would not in fact be beneficial in the filtering-scenario). /Chris

One thing to add to Denis's suggestion about ECG events. I encourage you to use a sample of ~300 plus throughout the recording (i.e. Find the events then take every 10th or something). As the ECG components/SSPs can shift over time depending on participant movement.
Hth
D

I think this is what I implemented in the repo shared above, it draws uniformly k=200 samples from all the
Epoche if I remember correctly.