Question about preprocessed EEG data

juliapottkaemper · October 1, 2024, 12:42pm

Dear all,

I am having issues understanding the cleaned resting-state EEG data I receive from MNE-BIDS. I have EEG data available from a Brainvision recording system (I did not acquire these data myself).

System information:

MNE version: 1.8.0
operating system: Linux-3.10.0

When looking at the data with the plot function, it looks normal and cleaned (*_cean_raw.fif). Just to make sure, this is the preprocessed file I need to use for further analyses, right? (I do not need the epoched data)
However, when I try to calculate the mean bandpower in different frequency bands, the value seems way too low (0.00000000000154386107709975 for the delta band for example). I have been struggling for a couple of weeks to get down to the issue, but I believe there is something in the MNE-BIDS preprocessing pipeline, which leads to these values. I came to this conclusion because I calculated the band power with the same code for the uncleaned data, which has values in normal ranges for power I believe (30.55).

The preprocessing steps that I used were “extended_infomax” for ICA, autoreject_local and filtering (between 0.5 and 45), and these all seem to work normal when looking at the html report of MNE-BIDS.

Here is the code that I used for calculating bandpower:

def bandpower(data, sf, band, window_sec=None, relative=False):
    band = np.asarray(band)
    low, high = band 

    # Define window length for Welch method
    if window_sec is not None:
        nperseg = window_sec * sf
    else:
        nperseg = (2 / low) * sf
       
    # Compute Welch's periodogram
    freqs, psd = welch(data, sf, nperseg=nperseg)
    plt.semilogy(freqs, psd)

    # Frequency resolution
    freq_res = freqs[1] - freqs[0]

    # Find indices corresponding to the frequency band
    idx_band = np.logical_and(freqs >= low, freqs <= high)

    # Calculate the band power using Simpson's rule for numerical integration
    bp = simpson(psd[idx_band], dx=freq_res)

    if relative:
        bp /= simpson(psd, dx=freq_res)

    return bp

# Function to process subjects and calculate band powers
def process_subjects(subject_list):
    results = []

    # Frequency bands of interest
    bands = {
        'Delta': (0.5, 4),
        'Theta': (4, 8),
        'Alpha': (8, 13),
        'Beta': (13, 30)
    }

    # Loop over all subjects
    for sub in subject_list:
        file_preprocessed = f"{data_path_root}/eyesclosed/derivatives/mne-bids-pipeline/{sub}/{ses}/eeg/{sub}_{ses}_task-{task}_proc-clean_raw.fif"
        
        try:
            # Load preprocessed data
            prepr = mne.io.read_raw_fif(file_preprocessed, preload=True)
            
            # Get the sampling frequency
            sfreq = prepr.info['sfreq']
            
            # Get the data for all EEG channels
            eeg_data = prepr.get_data(picks='eeg')  # Shape: (n_channels, n_times)
            
            # Initialize a dictionary to store band powers
            band_powers = {band: [] for band in bands.keys()}
            
            # Loop over each channel and compute band power
            for ch_idx in range(eeg_data.shape[0]):
                data = eeg_data[ch_idx, :]  # Get data for the current channel
                
                # Calculate band power for each frequency band
                for band_name, band_range in bands.items():
                    bp = bandpower(data, sfreq, band_range)
                    band_powers[band_name].append(bp)
      
            # Average band powers across all channels for this subject
            mean_band_powers = {band: np.mean(band_powers[band]) for band in bands.keys()}
            
            # Append the results for this subject
            results.append({'Subject': sub, **mean_band_powers})

I hope my question is clear and maybe came across the same problem.
Thanks in advance.

sotpapad · October 1, 2024, 1:48pm

Hi,

Can I ask what units are we talking about?
Maybe I am missing something, but my point is: if your signal values are in the order of μV (10e-6) and computing power is something like raising to the power of 2, then the value you are reporting seems perfectly normal to me.

Cheers,

juliapottkaemper · October 2, 2024, 6:14am

Hi,
Thank you so much for your reply.
So, when I looked at the EEGs and compared them by simply plotting the channels of the raw EEG and the preprocessed EEG next to each other, it becomes apparent that the units change from 10e+3 to 10e-7 in all channels. This is what makes it so difficult to comprehend, because the power values of the raw EEG seem find and what I would expect. Maybe I am missing something crucial, I do not know … This seems like a scaling issue in some way, but I do not know if such a thing happens in the pipeline somewhere? Thanks in advance!

Best regards,
Julia

sotpapad · October 2, 2024, 7:03am

Hi,

In your code you load the preprocessed data with mne.io.read_raw_fif and I guess you have access to the info, so take a look at it in order to see what are the units of the data. I think they should be encoded in V, thus the values would probably range around 10e-6 for μV.

But what about the raw data? How do you load it and what are the units?

Cheers,

juliapottkaemper · October 2, 2024, 9:48am

Hi,

The raw data are saved in units μV (if I load this manually with read_raw_brainvision). I pass the data to the automated MNE-BIDS pipeline. I cannot find the specified code where it is loading the raw data though. I checked the info of the preprocessed file and this indicates that it is saved in V if I am correct 'unit': 107 (FIFF_UNIT_V).

I was looking through some forum posts again and found this comment of Dan McCloy on another post regarding ERPs:

" FYI, in MNE-Python everything is stored in SI units. Sometimes our plotting functions will convert the units before plotting, just so that the axis labels don’t range from say 0.000001 to 0.000006 V, but instead from 1 to 6 μV. Again this is only on the plot, the underlying data object still stores the numbers in SI units." https://mne.discourse.group/t/how-to-specify-units/5961/2

Is it possible that this explains why the units of the raw EEG and the one of the preprocessed EEG may differ? In the forum post they only mention it is because of the plotting function though…

I saw that conversion may be the solution https://mne.discourse.group/t/rescale-data-import-from-fieldtrip/3402 But this then turns the values even smaller (10e-13). Do I need to do the conversion before passing the data to the automated MNE-BIDS pipeline?

drammock · October 2, 2024, 2:54pm

When MNE-Python loads the Brainvision file format, it will automatically (1) check the file header to see what units the data are stored in, and (2) convert everything to SI units (so for EEG data, that’s Volts). As @sotpapad said, that will scale microVolt data by a factor of 1e-6 upon loading. Then, when the same data is saved to FIF (whether or not you do any preprocessing steps to it) it will be saved in Volts (not converted back to microVolts).

I can’t explain this part. If you loaded the brainvision files using MNE-Python then they should have been scaled to Volts upon loading, so you should have gotten power values approximately in the 1e-12 range. Did you load them some other way? Another possibility is that the uncleaned data had some very large artifacts (~10 orders of magnitude above the rest of the data) that were removed by filtering and/or ICA. That might also explain this:

sotpapad · October 2, 2024, 6:51pm

Yeah, kind of weird what’s happening with the values.
Artifacts could be an explanation absolutely. But what kind of artifact would be so many orders of magnitude larger?

If I may, @juliapottkaemper I don’t understand why -as you say- you expect the raw power values to be those you have mentioned. And conversely, why do you think the per-processed data values are off?

juliapottkaemper · October 3, 2024, 7:59am

Hi all,

I believe I found the mistake.
First, I will mention some steps that I took to check (and some questions).

I can assure that there are no big artifacts in the EEGs, so that is unlikely the cause.

This is how I load the data in python:

prepr = mne.io.read_raw_fif(file_preprocessed, preload=True)
raw = mne.io.read_raw_brainvision(file_raw, preload=True)

When I now calculate the PSD (and bandpower in different frequency bands), I do see the same magnitude, so that is nice!

I looked at the info of both data (raw and preprocessed) as follows:

print(raw.info['chs'][0])
{'ch_name': 'Fp1', 'coil_type': 1 (FIFFV_COIL_EEG), 'kind': 2 (FIFFV_EEG_CH), 'logno': 1, 'scanno': 1, 'cal': 0.1, 'range': 1e-06, 'loc': array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]), 'unit': 107 (FIFF_UNIT_V), 'unit_mul': 0 (FIFF_UNITM_NONE), 'coord_frame': 4 (FIFFV_COORD_HEAD)}

print(prepr.info['chs'][0])
{'scanno': 1, 'logno': 1, 'kind': 2 (FIFFV_EEG_CH), 'range': 1.0, 'cal': 0.10000000149011612, 'coil_type': 1 (FIFFV_COIL_EEG), 'loc': array([-0.03090259,  0.11458518,  0.02786657, 0., 0., 0., nan, nan, nan, nan,
nan, nan]), 'unit': 107 (FIFF_UNIT_V), 'unit_mul': 0 (FIFF_UNITM_NONE), 'ch_name': 'Fp1', 'coord_frame': 4 (FIFFV_COORD_HEAD)}

It seems that the values are both stored in V (as you mentioned this happens when loading the data). But when looking at the values within the dataframe, they still differ in magnitude (1e-03 vs 1e-06). Do you know why this would occur after preprocessing?
I noticed something strange in the output here. It indicates for the raw file 1e-06 and for prepr 1.0. But I am not quite sure what this means, maybe you do? Is this related to the scaling issue maybe?

This was my main error I believe: I come from a Matlab background, and how matlab reads in the files is different compared to python. I compared the values of the raw and preprocessed files in Matlab as well (I loaded the .vhdr and .fif files with Fieldtrips data = ft_preprocessing(cfg);). Maybe this is where my confusion stems from, because the .fif file is already saved in V instead of μV, which then results in different magnitudes of power values?

Thanks for thinking along with me on this.

sotpapad · October 3, 2024, 8:25am

Ok, nice.

I am not sure about the range to be honest. But I definitely think that recordings can have some artifacts that would push the values from 10^-6 to 10^-3. You’d need to check in your pipeline what gets rejected and why in this case to be sure.

drammock · October 10, 2024, 10:48pm

what “Dataframe” do you mean here? are you calling raw.to_data_frame() and prepr.to_data_frame()? What arguments are you passing there?

The difference in the scaling entry of info['chs'] should be irrelevant — AFAIK it’s only used during the read process / not relevant for values stored as floats (I think it’s for cases like “this file format stores sample values as microvolts so we’ll need to scale by range before finalizing the read-in data”). @larsoner can confirm.

larsoner · October 15, 2024, 6:47pm

Yes we do not currently use info["chs"] scaling to do anything with the data in the Raw container. raw.get_data() should always return data in SI units.

In principle we could do something with it on write… I don’t think we do, but would need to check.

Topic		Replies	Views
Calculate spectral density Support & Discussions preprocessing , eeg	5	2139	April 23, 2022
ICA and Average Reference Projection not working Support & Discussions preprocessing , eeg	4	41	February 25, 2025
Preprocessing EEG data for mental imagery offline analyses Support & Discussions preprocessing , eeg , visualization	13	1264	September 4, 2022
data preprocessing Support & Discussions preprocessing	3	126	January 12, 2024
Artifact Removal from EEG Data using ICA Support & Discussions preprocessing , eeg , ica	22	295	November 22, 2024

Question about preprocessed EEG data

Related topics