decoding.Scaler is not centring data or giving it unit variance

Dan · August 9, 2024, 11:07pm

Hi everyone,

I have been working with EEG data, and using the scaler function in mne.decoding with the default parameters, where it is supposed to centre the data with zero mean and unit variance for each channel, from what I understand of the documentation. However, if I try it and then check by computing the mean and standard deviation for each channel, they are not 0 and 1 respectively.

I would be grateful if somebody could let me know what is happening - if I am doing something wrong or misunderstanding the method. Here is my software details and a minimal reproducible example:

MNE version: 1.4.2
operating system: e.g. macOS 14.3

import mne
import numpy as np
np.random.seed(1)
n_epochs, n_channels, n_times = 10, 5, 100
data = np.random.randn(n_epochs, n_channels, n_times)

info = mne.create_info(n_channels, 100, ch_types='eeg')

scaler = mne.decoding.Scaler(info)
data_scaled = scaler.fit_transform(data)

# Check mean and std in each channel
for i in range(n_channels):
    channel_data = data_scaled[:, i, :]
    print(f"Channel {i}:")
    print(f"  Mean: {channel_data.mean()}")
    print(f"  Std:  {channel_data.std()}")

sotpapad · August 10, 2024, 11:22am

Hi,

Based on the API docs and the corresponding tutorial I believe you can achieve what you want by passing the argument scaling='mean'. If I understand it correctly, using scalings=None must simply be dividing all values by a constant.

Cheers,

Dan · August 10, 2024, 6:28pm

Hi Sotiris,

Thanks for your response. I don’t think the tutorial link you posted is working.

Looking at the API docs, it says " This class scales data for each channel. It differs from scikit-learn classes (e.g., sklearn.preprocessing.StandardScaler) in that it scales each channel by estimating μ and σ using data from all time points and epochs, as opposed to standardizing each feature (i.e., each time point for each channel) by estimating using μ and σ using data from all epochs". And since with_mean and with_std are parameters that are true by default, it follows that the object should subtract the mean and divide by standard deviation in some way that is different from sklearn’s StandardScaler method (which I thought wasn’t ideal for many EEG data analysis applications).

So i’m not sure if either I am misunderstanding it, or the code does something different from what the API docs describe.
For context, I am using this as a final step in my preprocessing pipeline for use in algorithms that analyse spatiotemporal dynamics of EEG data, so I don’t want to destroy the inter-channel variability.

Thanks

richard · August 10, 2024, 8:54pm

The scaling happens for each time point separately across all epochs within one channel at a time if you feed in 3D data. So in your loop, you must create a second, nested loop across the time dimension.

Best wishes,
Richard

Edit: Reading the docs again – I might be mistaken. Unfortunately I currently don’t have time to look into this any further.

sotpapad · August 11, 2024, 1:22pm

Hi Dan,

I’ve created this much more simplified example based on your code. I think here you can find a clue.

import mne
from mne.decoding import Scaler
import numpy as np

np.random.seed(1)
n_epochs, n_channels, n_times = 2, 2, 100
#data = np.random.randn(n_epochs, n_channels, n_times)
data = np.ones((n_epochs, n_channels, n_times))
data[1,:,:] *= 2

info = mne.create_info(n_channels, 100, ch_types='eeg')

scaler_none = mne.decoding.Scaler(info, scalings=None)
scaler_mean = mne.decoding.Scaler(info, scalings="mean")
scaler_median = mne.decoding.Scaler(info, scalings="median")

data_scaled_none = scaler_none.fit_transform(data)
data_scaled_mean = scaler_mean.fit_transform(data)
data_scaled_median = scaler_median.fit_transform(data)

# Check mean and std in each channel
for i in range(n_channels):
    channel_data = [data_scaled_none[:, i, :], data_scaled_mean[:, i, :], data_scaled_median[:, i, :]]
    print("Channel {}:".format(i))
    print("\tMean: {}".format([ch_data.mean() for ch_data in channel_data]))
    print("\tStd: {}\n".format([ch_data.std() for ch_data in channel_data]))

Check first the simplest data, then see how even with some numeric approximations the random noise also gives similar results.

Cheers,

henrylucas · August 11, 2024, 5:22pm

When using the Scaler for your data, it’s crucial to ensure that it is correctly centering your data around zero and scaling it to have a unit variance. For this purpose, the StandardScaler from scikit-learn is commonly used. It should provide the desired results. If you’re still facing issues, verify the data for any anomalies and check the scaler’s configuration. Additionally, if you are integrating this process with applications like [[loklok shorts], ensure that the data preprocessing steps align with the application’s requirements for optimal performance.

richard · August 11, 2024, 11:03pm

@henrylucas It seems your response was generated via an LLM (which is fine) and is not very useful (which is very bad). Please don’t do this again, or we’ll ban you for spamming. Thanks.

Dan · August 12, 2024, 3:18pm

Hi Sotiris,

Thank you, this script highlights the differences well. I am surprised at how unclear/potentially misleading the API doc is for this, given this must be quite a commonly used feature - someone should probably update it.

Best regards,
Dan

Dan · August 12, 2024, 3:21pm

Hi Richard, I’m assuming this is targeted towards henrylucas’ response?

richard · August 12, 2024, 3:49pm

Yes, I’ve edited my response to clarify that!

richard · August 12, 2024, 3:50pm

Could you kindly share your conclusions with the forum? This could then help improve the documentation!

system · August 19, 2024, 3:51pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
time-series scale Support & Discussions preprocessing	9	401	October 26, 2022
Finding the correct scale Support & Discussions	3	513	October 12, 2021
Scaling issue while plotting EEG signal Support & Discussions preprocessing , eeg , visualization	3	54	November 15, 2024
Scaling for Plots Support & Discussions visualization	7	328	July 11, 2022
Plotting Error - EEG Unit Support & Discussions visualization , data-import	7	523	July 7, 2022

decoding.Scaler is not centring data or giving it unit variance

Related topics