Possibility of Data leakage during Preprocessing Step - ICA artefact repair/removal

Hi everyone,
dear MNE Team,

I have a general question regarding MEG/EEG preprocessing and its influence on later machine learning analyses about which I have been pondering for a while now:

Could using ICA preprocessing (repairing artefacts; as explained here ICA MNE Tutorial) influence later decoding/ machine learning approaches in the sense of data leakage from train to test sense?

I had some interesting discussions about it with diverse opinions and would be very interested in and grateful for your assessment.
Or whether you know if someone ever tried to systematically explore it…

Many thanks for your reply!


honestly I would not consider data cleaning first. I would setup my ML pipeline
and see what I obtain with the raw data. Time by time decoding for example
is very robust.



That’s a good question. I’d say you’re probably relatively “safe”, because ICA is an unsupervised method, i.e. doesn’t require any labels. Of course this doesn’t rule out information leakage completely, but you’d have to be much more specific if you really suspect that this is what’s happening.

Thanks a lot for your reply!

In my experience, decoders don’t care too much about ICA cleanup. They’re pretty good at ignoring noise that does not contain information.

Example results, no difference of applying ICA or not applying it