Problem with MNE connectivity vector_auto_regression

MNE version: 1.3.0
MNE_connectivity version: 0.4.0
OS: macOS 10.15.7

I am trying to use the vector_auto_regression function on my SEEG dataset. I have no trouble running the tutorial pipeline. However, when I use my own dataset, vector_auto_regression creates a matrix where every row has the same value at each column – meaning that each electrode has the same influence on all other electrodes. Additionally, instead of ranging from -1 to 1, the values are in the order of magnitude of +/- 10^3. Has anyone encountered this problem before? What might be the problem?

My data have a sampling rate of 1000Hz and the original recording lasts for 2 hours. I first bandpass filtered from 0.5 to 300Hz and then applied a notch filter at 60Hz and its harmonics. Then, I removed bad channels before applying a common average reference. Finally, I trimmed the clip into a 5 minute segment and epoched intervals of 0.5s, without overlap. Using the 600 epochs, I created a dynamic VAR model with the default parameters. Unfortunately, I cannot share my data so it will be difficult for you to reproduce this problem. How can I proceed with fixing this?

Thanks.

Tagging @adam2392 who may be able to help.

What is the nature of your data?

I.e. range, units, storage format, etc.

I would venture a guess that you’re loading the data in uV or even nV when the scale of the data variation is at V or mV level

Thanks for your quick reply!

My data are stored as .edf format. I am reading them in using mne.io.read_raw_edf(file_path). The default unit for the data is set to uV. If a try using a different unit in the read_raw_edf function then I get an error: “Unit for channel G1 is present in the file as ‘uV’, cannot overwrite it with the units argument ‘V’.” I investigated the data values at different percentiles. The absolute minimum is -0.003 and the absolute maximum is 0.002. The 5th percentile is -0.0003 and the 95th percentile is 0.0003.

I think I have isolated the problem! In my current pipeline, I re-reference the data using a common average reference with raw_ieeg.set_eeg_reference(ref_channels=‘average’). If I leave out this line of code and do not re-reference my data, then I get out sensible values ranging from (-1,1) differing across columns for vector_auto_regression. You can replicate this problem with the tutorial by adding a line to set the eeg reference to average.

Here:

import numpy as np

import mne
from mne import make_fixed_length_epochs
from mne_bids import BIDSPath, read_raw_bids
from mne_connectivity import vector_auto_regression

bids_root = mne.datasets.epilepsy_ecog.data_path()
bids_path = BIDSPath(root=bids_root, subject='pt1', session='presurgery',
                     task='ictal', datatype='ieeg', extension='vhdr')
raw = read_raw_bids(bids_path=bids_path, verbose=False)
line_freq = raw.info['line_freq']
raw.pick_types(ecog=True)
raw.load_data()
raw.notch_filter(line_freq)
raw.drop_channels(raw.info['bads'])

raw.set_eeg_reference(ref_channels='average') ## LINE I ADDED

events, event_id = mne.events_from_annotations(raw)
onset_id = event_id['onset']
onset_idx = np.argwhere(events[:, 2] == onset_id)
onset_sample = events[onset_idx, 0].squeeze()
onset_sec = onset_sample / raw.info['sfreq']
raw = raw.crop(tmin=0, tmax=onset_sec, include_tmax=False)
epochs = make_fixed_length_epochs(raw=raw, duration=0.5, overlap=0.25)
times = epochs.times
ch_names = epochs.ch_names
conn = vector_auto_regression(
    data=epochs.get_data(), times=times, names=ch_names, model='dynamic'
)
output = conn.get_data()
print(output[0])

Why is this happening? Is there any way I can use the average reference without encountering this error?

Hi @adam2392, any update on a solution for this problem? Thanks for your help.

My guess would be that the average referencing introduces numerical instability. The condition number of the X matrices is probably very low. You can check this manually and impose a regularization term. If you want to implement this in the underlying function in a PR, I can guide that.

I’m not sure exactly how I would implement that (i.e., I am not familiar with condition number and regularization term), but I will be happy to give it a shot. Can you tell me a little more about what I should do and/or point me to any resources that can explain the underlying problem and solution?

Some refs:

Others prolly can be found by googling: “linear regression with regularization”. Lmk if that helps!

Setting an average reference creates a “virtual” electrode containing the average signal across all electrodes and re-references the data from the previous physical reference electrode to this virtual one. If the physical reference electrode is not part of the data file (by default, it isn’t, as it would be a data channel containing all zeros), this operation decreases the rank of the data by 1. We can prevent this by first adding the physical electrode to the data and then applying the average reference. The new reference electrode now contains the average signal that was removed from all the channels and indeed contains the signal recorded at the physical reference site. In code:

raw = read_raw_bids(bids_path=bids_path, verbose=False)
raw.load_data()
raw.add_reference_channels('REF')
raw.set_eeg_reference('average')

This seems to work with the tutorial dataset.

However, @adam2392 is right in stating that the vector_auto_regression function should probably have a regularization option to deal with rank-deficient data. There are many ways for the data to drop in rank and we cannot always prevent this.

2 Likes

Hi @wmvanvliet, thanks for adding your input! I tried doing the fix that you suggested, but I’m not sure if it’s working. I ran the code you provided; however, the average signal does not appear to be contained within the REF channel nor does the data seem to change before and after applying the common average reference. You can replicate this with:

raw = read_raw_bids(bids_path=bids_path, verbose=False)
raw.load_data()

ave = raw.copy().add_reference_channels('REF')
ave.set_eeg_reference('average')

ave_data = ave.get_data()
raw_data = raw.get_data()

If I check ave_data[-1], I see that the last row of the matrix contains all zeros, which is the reference channel without the average signal. Running np.allclose(ave_data[:-1],raw_data) returns True, meaning that the data are identical with/without the average reference. Finally, when I plot raw and ave, they visually appear the same, except for the REF channel which is a flatline. You are correct that this set of code does work when running the VAR, but does it actually apply a common-average reference?