Scale difference between sensors (mag, grad, eeg) after whitening epoched data


EDIT: I clarified some details below and added system info at the end

I am running RSA analysis on MEG data combined across sensors (MAG, GRAD and EEG). I tried to whiten the epoched data before applying RSA and I noticed that there’s a difference in scale between the three sensor types evan after whitening.

Here I show data from a single participant from my dataset. All epochs are lumped together after whitening, separately for each sensor type. The y-axis scales are shared across plots. X-axis is time in milliseconds, 0 is stimulus onset. There is about an order of magnitude difference in variance between plots.

I adapted the code in mne.cov.whiten_evoked (line 2288-2292 in mne-python/mne/ ) to do the whitening

# Compute covariance
noise_covs = compute_covariance(
    epochs['(epoch == "written")'],
    method=['shrunk', 'empirical', 'ledoit_wolf'],

# Compute whitener
W, _ = compute_whitener(noise_covs[0],
                        picks=picks_to_idx(, picks))

# Apply whitener
data = epochs[cond_query].get_data(picks=picks_to_idx(, picks))
data = (, 2), W)).swapaxes(1, 2)

Talking to @olafhauk about this he pointed me to this tutorial on plotting whitened data. It seems that there is also a difference in scale between sensors in the epoched data after whitening (see the output ofepochs.plot(noise_cov=noise_cov, events=True) in the tutorial). This scale difference seems to disappear if the evoked data are whitened (see the output of evoked.plot(noise_cov=noise_cov, time_unit="s") in the tutorial). So all in all, this seems to be a general phenomenon, not just in my data.

I also noticed that in the documentation of Epochs.plot, under the noise_cov parameter, it says: “For data processed with SSS, the effective dependence between magnetometers and gradiometers may introduce differences in scaling, consider using mne.Evoked.plot_white(). This could be the reason of seeing scale differences, or could there be anything else? I don’t understand what “the effective dependence between magnetometers and gradiometers” means.

In theory whitening should bring the different sensor types to the same scale, so I wouuld be hesitant to apply another normalization step (i.e., something like StandardScaler from scikit-learn). Is there a way of fixing this so that the epoched data are on the same scale after whitening? I need epoched, not evoked data because the RSA analysis.

I’m pinging @Denis and @agramfort, I hope you don’t mind, as you are the authors of the original paper on covariance estimation and whitening based on which this is implemented (Engemann & Gramfort 2015).

Many thanks,

System info:

Platform             Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.17
Python               3.11.6 | packaged by conda-forge | (main, Oct  3 2023, 10:40:35) [GCC 12.3.0]
Executable           /home/ma09/.conda/envs/my_mne1.5/bin/python
CPU                  x86_64 (32 cores)
Memory               251.4 GB

├☑ mne               1.5.1
├☑ numpy             1.24.4 (OpenBLAS 0.3.24 with 32 threads)
├☑ scipy             1.11.3
├☑ matplotlib        3.8.0 (backend=module://matplotlib_inline.backend_inline)
├☑ pooch             1.7.0
└☑ jinja2            3.1.2

Numerical (optional)
├☑ sklearn           1.3.1
├☑ numba             0.57.1
├☑ nibabel           5.1.0
├☑ nilearn           0.10.2
├☑ dipy              1.7.0
├☑ openmeeg          2.5.6
├☑ pandas            2.1.1
└☐ unavailable       cupy

Visualization (optional)
├☑ pyvista           0.42.2 (OpenGL unavailable)
├☑ pyvistaqt         0.0.0
├☑ vtk               9.2.6
├☑ qtpy              2.4.0 (None=None)
├☑ pyqtgraph         0.13.3
├☑ mne-qt-browser    0.5.2
├☑ ipywidgets        8.1.1
├☑ trame_client      2.12.4
├☑ trame_server      2.12.0
├☑ trame_vtk         2.5.8
├☑ trame_vuetify     2.3.1
└☐ unavailable       ipympl

Ecosystem (optional)
├☑ mne-bids          0.14.dev0
├☑ mne-bids-pipeline 1.5.0.dev27+g754c85f
└☐ unavailable       mne-nirs, mne-features, mne-connectivity, mne-icalabel

How are you plotting your data? Because this seems to work for me:

data = (epochs.get_data(copy=False).swapaxes(1, 2) @ W).swapaxes(1, 2)
epochs_whitened = mne.EpochsArray(data,,, tmin=epochs.tmin, event_id=epochs.event_id)

fig, axes = plt.subplots(ncols=3, sharex=True, sharey=True)
axes[0].plot(epochs_whitened.times, epochs_whitened.copy().pick('mag').get_data(copy=False).mean(axis=0).T, color='black', linewidth=1, alpha=0.2)
axes[1].plot(epochs_whitened.times, epochs_whitened.copy().pick('grad').get_data(copy=False).mean(axis=0).T, color='black', linewidth=1, alpha=0.2)
axes[2].plot(epochs_whitened.times, epochs_whitened.copy().pick('eeg').get_data(copy=False).mean(axis=0).T, color='black', linewidth=1, alpha=0.2)

During the baseline interval they are not exactly the same amplitude, but pretty close.

Thanks for the response @wmvanvliet.

I think the main difference between your way and my way is that you plot the evoked responses, while I plotted the actual epochs.

This is how it looks with my data using your method:

It looks better, but the magnetometers are still tiny. Moreover, I do need the epochs to be on the same scale, not the evoked responses, as I do RSA based on individual epochs.


It looks like there may be some problem with the covariance estimation, since the grads and eeg sensors are close, but indeed mags are way too small. Can you show us the results of noise_cov.plot( Perhaps there’s something obvious that we can spot.

The output of noise_cov.plot(

Computing rank from covariance with rank=None
    Using tolerance 9.1e-15 (2.2e-16 eps * 102 dim * 0.4  max singular value)
    Estimated rank (mag): 71
    MAG: rank 71 computed from 102 data channels with 0 projectors
Computing rank from covariance with rank=None
    Using tolerance 9.2e-14 (2.2e-16 eps * 204 dim * 2  max singular value)
    Estimated rank (grad): 71
    GRAD: rank 71 computed from 204 data channels with 0 projectors
Computing rank from covariance with rank=None
    Using tolerance 8e-14 (2.2e-16 eps * 61 dim * 5.9  max singular value)
    Estimated rank (eeg): 60
    EEG: rank 60 computed from 61 data channels with 0 projectors

only obvious thing I spot is some bad EEG channels. But that shouldn’t throw off the whitening that much I think.

Thanks, for checking! I also don’t see how a few bad channels in the EEG could throw off the covariance estimation for the magnetometers so badly.

I also checked with a few other subjects in my dataset and it looks the same there as well, so this seems to be a general issue. Do you have any suggestions what to change in the covariance estimation?


I did a systematic analysis and I think the issue is related to maxfiltering. All my data shown previously were processed with maxfilter.

If I start fresh from the raw data and I do a very rudimentary preprocessing (filter (1-40Hz), downsample (250Hz), epoching) but don’t apply maxfilter I get equal scale across sensor types:
Pasted image 20240122160451

For comparison, this is the result of the same whitening procedure applied to the same data preprocessed as above just with maxfilter applied.

I’m not an expert on maxfilter, any thoughts anybody?


My preprocessing was done with MNE-BIDS-pipeline, the maxfilter settings were as follows:

find_flat_channels_meg: bool = True
Auto-detect "flat" channels (i.e. those with unusually low variability) and
mark them as bad.

find_noisy_channels_meg: bool = True
Auto-detect "noisy" channels and mark them as bad.

use_maxwell_filter: bool = True
Whether or not to use Maxwell filtering to preprocess the data.

!!! warning
    If the data were recorded with internal active compensation (MaxShield),
    they need to be run through Maxwell filter to avoid distortions.
    Bad channels need to be set through BIDS channels.tsv and / or via the
    `find_flat_channels_meg` and `find_noisy_channels_meg` options above
    before applying Maxwell filter.

mf_st_duration: Optional[float] = None
There are two kinds of Maxwell filtering: SSS (signal space separation) and
tSSS (temporal signal space separation)
(see [Taulu et al., 2004](

If not None, apply spatiotemporal SSS (tSSS) with specified buffer
duration (in seconds). MaxFilter™'s default is 10.0 seconds in v2.2.
Spatiotemporal SSS acts as implicitly as a high-pass filter where the
cut-off frequency is 1/st_dur Hz. For this (and other) reasons, longer
buffers are generally better as long as your system can handle the
higher memory usage. To ensure that each window is processed
identically, choose a buffer length that divides evenly into your data.
Any data at the trailing edge that doesn't fit evenly into a whole
buffer window will be lumped into the previous buffer.

???+ info "Good Practice / Advice"
    If you are interested in low frequency activity (<0.1Hz), avoid using
    tSSS and set `mf_st_duration` to `None`.

    If you are interested in low frequency above 0.1 Hz, you can use the
    default `mf_st_duration` to 10 s, meaning it acts like a 0.1 Hz
    high-pass filter.

???+ example "Example"
    mf_st_duration = None
    mf_st_duration = 10.  # to apply tSSS with 0.1Hz highpass filter.

mf_st_correlation: float = 0.98
The correlation limit for spatio-temporal SSS (tSSS).

???+ example "Example"
    st_correlation = 0.98

mf_head_origin: Union[Literal["auto"], ArrayLike] = "auto"
`mf_head_origin` : array-like, shape (3,) | 'auto'
Origin of internal and external multipolar moment space in meters.
If 'auto', it will be estimated from headshape points.
If automatic fitting fails (e.g., due to having too few digitization
points), consider separately calling the fitting function with different
options or specifying the origin manually.

???+ example "Example"
    mf_head_origin = 'auto'
mf_destination: Union[Literal["reference_run"], ArrayLike] = "reference_run"
Despite all possible care to avoid movements in the MEG, the participant
will likely slowly drift down from the Dewar or slightly shift the head
around in the course of the recording session. Hence, to take this into
account, we are realigning all data to a single position. For this, you can:

1. Choose a reference run. Often one from the middle of the recording session
   is a good choice. Set `mf_destination = "reference_run" and then set
   This will result in a device-to-head transformation that differs between
2. Choose a standard position in the MEG coordinate frame. For this, pass
   a 4x4 transformation matrix for the device-to-head
   transform. This will result in a device-to-head transformation that is
   the same across all subjects.

   ???+ example "A Standardized Position"
   from mne.transforms import translation
   mf_destination = translation(z=0.04)```

mf_int_order: int = 8
Internal order for the Maxwell basis. Can be set to something lower (e.g., 6
or higher for datasets where lower or higher spatial complexity, respectively,
is expected.

mf_reference_run: Optional[str] = '03'
Which run to take as the reference for adjusting the head position of all
runs when [`mf_destination="reference_run"`][mne_bids_pipeline._config.mf_destination].
If `None`, pick the first run.

???+ example "Example"
    mf_reference_run = '01'  # Use run "01"

mf_cal_fname: Optional[str] = None
!!! warning
     This parameter should only be used for BIDS datasets that don't store
     the fine-calibration file
     [according to BIDS](
Path to the Maxwell Filter calibration file. If `None`, the recommended
location is used.
???+ example "Example"
    mf_cal_fname = '/path/to/your/file/calibration_cal.dat'
"""  # noqa : E501

mf_ctc_fname: Optional[str] = None
Path to the Maxwell Filter cross-talk file. If `None`, the recommended
location is used.

!!! warning
     This parameter should only be used for BIDS datasets that don't store
     the cross-talk file
     [according to BIDS](
???+ example "Example"
    mf_ctc_fname = '/path/to/your/file/crosstalk_ct.fif'
"""  # noqa : E501

mf_mc: bool = False
If True, perform movement compensation on the data.

mf_mc_t_step_min: float = 0.01
Minimum time step to use during cHPI coil amplitude estimation.

mf_mc_t_window: Union[float, Literal["auto"]] = "auto"
The window to use during cHPI coil amplitude estimation and in cHPI filtering.
Can be "auto" to autodetect a reasonable value or a float (in seconds).

mf_mc_gof_limit: float = 0.98
Minimum goodness of fit to accept for each cHPI coil.

mf_mc_dist_limit: float = 0.005
Minimum distance (m) to accept for cHPI position fitting.

mf_mc_rotation_velocity_limit: Optional[float] = None
The rotation velocity limit (degrees/second) to use when annotating
movement-compensated data. If `None`, no annotations will be added.

mf_mc_translation_velocity_limit: Optional[float] = None
The translation velocity limit (meters/second) to use when annotating
movement-compensated data. If `None`, no annotations will be added.

mf_filter_chpi: Optional[bool] = None
Use mne.chpi.filter_chpi after Maxwell filtering. Can be None to use
the same value as [`mf_mc`][mne_bids_pipeline._config.mf_mc].
Only used when [`use_maxwell_filter=True`][mne_bids_pipeline._config.use_maxwell_filter]
1 Like

One thing maxfilter is infamous for is reducing the rank of the data. You could try setting rank=dict(grad=71, mag=71, eeg=60) when calling compute_covariance, see if that makes a difference for you.

Thanks for the suggestion @wmvanvliet, I just ran it and it didn’t make it better unfortunately.

Reading more about this in Engeman & Gramfort (2015) and digging in the implementation, I just see this more of a feature than an issue. For example, Engeman & Gramfort (2015) make this cautionary sentence in the discussion: “…when computing the noise covariance after application of dimensionality reducing operators, the PPCA and FA models should be used with care.”

Also, as I mentioned in my original post:

So I think we just have to live with the fact that if the data were processed with maxfilter (SSS), after whitening there will be scale differences between MAG and GRAD, depending on the data of course.

But this could be an issue during RSA or decoding analyses. To overcome this, @olafhauk recommended to me to project the data to the nonzero components during whitening instead of rotating it back to the sensor space. This can be achived by setting pca=True in mne.cov.compute_whitener, instead of the default pca=False. This way after whitening I have only 71 MEG channels (equal to the rank of MEG channels after maxfilter) and they are at a comparable scale to the EEG (plotting the evoked responses after whitening below):
Pasted image 20240123163901

This has an additional benefit that it throws away uninformative features and saves compute time during RSA.

Of course the other solution could be to not use maxfilter in the preprocessing for RSA.

Let me know if you have any thoughts on this.