Question about preprocessed EEG data

Dear all,

I am having issues understanding the cleaned resting-state EEG data I receive from MNE-BIDS. I have EEG data available from a Brainvision recording system (I did not acquire these data myself).

System information:

  • MNE version: 1.8.0
  • operating system: Linux-3.10.0

When looking at the data with the plot function, it looks normal and cleaned (*_cean_raw.fif). Just to make sure, this is the preprocessed file I need to use for further analyses, right? (I do not need the epoched data)
However, when I try to calculate the mean bandpower in different frequency bands, the value seems way too low (0.00000000000154386107709975 for the delta band for example). I have been struggling for a couple of weeks to get down to the issue, but I believe there is something in the MNE-BIDS preprocessing pipeline, which leads to these values. I came to this conclusion because I calculated the band power with the same code for the uncleaned data, which has values in normal ranges for power I believe (30.55).

The preprocessing steps that I used were “extended_infomax” for ICA, autoreject_local and filtering (between 0.5 and 45), and these all seem to work normal when looking at the html report of MNE-BIDS.

Here is the code that I used for calculating bandpower:

def bandpower(data, sf, band, window_sec=None, relative=False):
    band = np.asarray(band)
    low, high = band 

    # Define window length for Welch method
    if window_sec is not None:
        nperseg = window_sec * sf
    else:
        nperseg = (2 / low) * sf
       
    # Compute Welch's periodogram
    freqs, psd = welch(data, sf, nperseg=nperseg)
    plt.semilogy(freqs, psd)

    # Frequency resolution
    freq_res = freqs[1] - freqs[0]

    # Find indices corresponding to the frequency band
    idx_band = np.logical_and(freqs >= low, freqs <= high)

    # Calculate the band power using Simpson's rule for numerical integration
    bp = simpson(psd[idx_band], dx=freq_res)

    if relative:
        bp /= simpson(psd, dx=freq_res)

    return bp

# Function to process subjects and calculate band powers
def process_subjects(subject_list):
    results = []

    # Frequency bands of interest
    bands = {
        'Delta': (0.5, 4),
        'Theta': (4, 8),
        'Alpha': (8, 13),
        'Beta': (13, 30)
    }

    # Loop over all subjects
    for sub in subject_list:
        file_preprocessed = f"{data_path_root}/eyesclosed/derivatives/mne-bids-pipeline/{sub}/{ses}/eeg/{sub}_{ses}_task-{task}_proc-clean_raw.fif"
        
        try:
            # Load preprocessed data
            prepr = mne.io.read_raw_fif(file_preprocessed, preload=True)
            
            # Get the sampling frequency
            sfreq = prepr.info['sfreq']
            
            # Get the data for all EEG channels
            eeg_data = prepr.get_data(picks='eeg')  # Shape: (n_channels, n_times)
            
            # Initialize a dictionary to store band powers
            band_powers = {band: [] for band in bands.keys()}
            
            # Loop over each channel and compute band power
            for ch_idx in range(eeg_data.shape[0]):
                data = eeg_data[ch_idx, :]  # Get data for the current channel
                
                # Calculate band power for each frequency band
                for band_name, band_range in bands.items():
                    bp = bandpower(data, sfreq, band_range)
                    band_powers[band_name].append(bp)
      
            # Average band powers across all channels for this subject
            mean_band_powers = {band: np.mean(band_powers[band]) for band in bands.keys()}
            
            # Append the results for this subject
            results.append({'Subject': sub, **mean_band_powers})

I hope my question is clear and maybe came across the same problem.
Thanks in advance.

Hi,

Can I ask what units are we talking about?
Maybe I am missing something, but my point is: if your signal values are in the order of ÎĽV (10e-6) and computing power is something like raising to the power of 2, then the value you are reporting seems perfectly normal to me.

Cheers,

Hi,
Thank you so much for your reply.
So, when I looked at the EEGs and compared them by simply plotting the channels of the raw EEG and the preprocessed EEG next to each other, it becomes apparent that the units change from 10e+3 to 10e-7 in all channels. This is what makes it so difficult to comprehend, because the power values of the raw EEG seem find and what I would expect. Maybe I am missing something crucial, I do not know … This seems like a scaling issue in some way, but I do not know if such a thing happens in the pipeline somewhere? Thanks in advance!

Best regards,
Julia

Hi,

In your code you load the preprocessed data with mne.io.read_raw_fif and I guess you have access to the info, so take a look at it in order to see what are the units of the data. I think they should be encoded in V, thus the values would probably range around 10e-6 for ÎĽV.

But what about the raw data? How do you load it and what are the units?

Cheers,

Hi,

The raw data are saved in units ÎĽV (if I load this manually with read_raw_brainvision). I pass the data to the automated MNE-BIDS pipeline. I cannot find the specified code where it is loading the raw data though. I checked the info of the preprocessed file and this indicates that it is saved in V if I am correct 'unit': 107 (FIFF_UNIT_V).

I was looking through some forum posts again and found this comment of Dan McCloy on another post regarding ERPs:

" FYI, in MNE-Python everything is stored in SI units. Sometimes our plotting functions will convert the units before plotting, just so that the axis labels don’t range from say 0.000001 to 0.000006 V, but instead from 1 to 6 μV. Again this is only on the plot, the underlying data object still stores the numbers in SI units." https://mne.discourse.group/t/how-to-specify-units/5961/2

Is it possible that this explains why the units of the raw EEG and the one of the preprocessed EEG may differ? In the forum post they only mention it is because of the plotting function though…

I saw that conversion may be the solution https://mne.discourse.group/t/rescale-data-import-from-fieldtrip/3402 But this then turns the values even smaller (10e-13). Do I need to do the conversion before passing the data to the automated MNE-BIDS pipeline?

When MNE-Python loads the Brainvision file format, it will automatically (1) check the file header to see what units the data are stored in, and (2) convert everything to SI units (so for EEG data, that’s Volts). As @sotpapad said, that will scale microVolt data by a factor of 1e-6 upon loading. Then, when the same data is saved to FIF (whether or not you do any preprocessing steps to it) it will be saved in Volts (not converted back to microVolts).

I can’t explain this part. If you loaded the brainvision files using MNE-Python then they should have been scaled to Volts upon loading, so you should have gotten power values approximately in the 1e-12 range. Did you load them some other way? Another possibility is that the uncleaned data had some very large artifacts (~10 orders of magnitude above the rest of the data) that were removed by filtering and/or ICA. That might also explain this:

1 Like

Yeah, kind of weird what’s happening with the values.
Artifacts could be an explanation absolutely. But what kind of artifact would be so many orders of magnitude larger? :thinking:

If I may, @juliapottkaemper I don’t understand why -as you say- you expect the raw power values to be those you have mentioned. And conversely, why do you think the per-processed data values are off?

Hi all,

I believe I found the mistake.
First, I will mention some steps that I took to check (and some questions).

I can assure that there are no big artifacts in the EEGs, so that is unlikely the cause.

  1. This is how I load the data in python:
prepr = mne.io.read_raw_fif(file_preprocessed, preload=True)
raw = mne.io.read_raw_brainvision(file_raw, preload=True)

When I now calculate the PSD (and bandpower in different frequency bands), I do see the same magnitude, so that is nice!

  1. I looked at the info of both data (raw and preprocessed) as follows:
print(raw.info['chs'][0])
{'ch_name': 'Fp1', 'coil_type': 1 (FIFFV_COIL_EEG), 'kind': 2 (FIFFV_EEG_CH), 'logno': 1, 'scanno': 1, 'cal': 0.1, 'range': 1e-06, 'loc': array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]), 'unit': 107 (FIFF_UNIT_V), 'unit_mul': 0 (FIFF_UNITM_NONE), 'coord_frame': 4 (FIFFV_COORD_HEAD)}

print(prepr.info['chs'][0])
{'scanno': 1, 'logno': 1, 'kind': 2 (FIFFV_EEG_CH), 'range': 1.0, 'cal': 0.10000000149011612, 'coil_type': 1 (FIFFV_COIL_EEG), 'loc': array([-0.03090259,  0.11458518,  0.02786657, 0., 0., 0., nan, nan, nan, nan,
nan, nan]), 'unit': 107 (FIFF_UNIT_V), 'unit_mul': 0 (FIFF_UNITM_NONE), 'ch_name': 'Fp1', 'coord_frame': 4 (FIFFV_COORD_HEAD)}

It seems that the values are both stored in V (as you mentioned this happens when loading the data). But when looking at the values within the dataframe, they still differ in magnitude (1e-03 vs 1e-06). Do you know why this would occur after preprocessing?
I noticed something strange in the output here. It indicates for the raw file 1e-06 and for prepr 1.0. But I am not quite sure what this means, maybe you do? Is this related to the scaling issue maybe?

  1. This was my main error I believe: I come from a Matlab background, and how matlab reads in the files is different compared to python. I compared the values of the raw and preprocessed files in Matlab as well (I loaded the .vhdr and .fif files with Fieldtrips data = ft_preprocessing(cfg);). Maybe this is where my confusion stems from, because the .fif file is already saved in V instead of ÎĽV, which then results in different magnitudes of power values?

Thanks for thinking along with me on this.

Ok, nice.

I am not sure about the range to be honest. But I definitely think that recordings can have some artifacts that would push the values from 10^-6 to 10^-3. You’d need to check in your pipeline what gets rejected and why in this case to be sure.