Computing regression on sensor data then transforming to source space

Hi MNE listserv,

I have single-trial data that I would like to regress a predictor (let's
say word frequency) on it and then compute a source estimate. I'm planning
to use mne-python to do this computation. I was wondering if I could do the
regression over single trial sensor data first, get the beta values for
each sensor over time, and then compute the source estimate as if it were
an evoked object.

My presumption is that it should be fine if the source transformation is
linear. The other option would be to source transform the data then do the
regression but the problem with doing this first is that computing the
source estimates is more demanding on memory (say about 1000 trials with
the around 5000 sources over 600-800ms of time). It would be more efficient
if this computation could be done first if it is not computationally ill.

What are your thoughts?

Best,

Hi Teon,
     Assuming that your regression is a linear model fit that doesn't operate across channels or across time in any way, the beta values are just linear combinations of sensor space data over groups of trials... So it should be fine...

One thing to keep in mind, if the conditions (i.e., levels of the word frequency factor in your example) have different numbers of trials is to scale the noise covariance by an appropriate number as discussed in the "Effective number of averages" section of the "the current estimate" chapter of the manual...

Hari

Hari Bharadwaj
PhD Candidate, Biomedical Engineering,
Auditory Neuroscience Laboratory
Boston University, Boston, MA 02215

Martinos Center for Biomedical Imaging,
Massachusetts General Hospital
Charlestown, MA 02129

hari at nmr.mgh.harvard.edu
Ph: 734-883-5954

Dear Teon,

You have raised several interesting questions on which I would like to expand.
Hari responded to several technical issues, viz. (1) constraints on what you do to retain the validity of your subsequent projection into source space and (2) weighting the regression to compensate for unequal numbers of trials for different levels of the independent variable.

Here are some points about the meaning of what you are doing and about the technical issues.

(1) If the independent variable is scaled rather than a 0/1 dummy, i.e. has multiple numeric levels, then your regression is asking a specific quantitative question about the amplitude of the magnetic field/source, i.e. is the amplitude a linear function of the independent variable? If for example the variable takes values n and 2n, you are asking: "Is the amplitude for the "2n" trials double what it is for the "n" trials?

(2) I think it's reasonable to assume that many of the sources contributing to the magnetic field have nothing to do with the task. Although you are working with single trial data, your regression across the trials is collapsing the data in a generalized version of averaging. That helps attenuate the contributions to the field of unrelated sources. But if (a) there was a way up front to define regions of interest within the brain which you think are involved in the task, and (b) if the linear hypothesis you are testing is true, you should do better by doing the projection first and then doing the regression on the vertices within one ROI at a time. In that way you take advantage of the signal space separation capabilities of your projection operation to isolate the sources you think are involved. If you want to get formal statistics from your regression, you must find a way to adjust the degrees of freedom since presumably the source estimates from nearby vertices lack independence.

(3) Multidimensional regression: I presume that you are doing your regression for a single time point, tau. Or perhaps you are averaging the amplitude values centered on the peak. In either case you get a single number for each magnetic field sensor for each trial. Instead you could use multiple points about the center of a peak and use a low order polynomial of tau multiplied by your original independent variable. Note that averaging is equivalent to using a zero-order polynomial. If you use say 21 data points centered on the peak, you increase your degrees of freedom by quite a lot. Of course your 21 data points lack independence but you still are using more information to do the regression.

(4) The more important additional variable is along the time axis for the sequence of trials. If you use a polynomial function for that, any non-zero Beta other than the zero-order one represents a nonstationarity in your measurements. This is rarely assessed but with humans doing a task is always a concern and it's interesting too. The attached figure illustrates ideas (3) and (4) with evoked potential data.

I hope I'm understanding you correctly and that this is helpful.

Regards,

Don

Don Krieger, Ph.D.
Department of Neurological Surgery
University of Pittsburgh
(412)648-9654 Office
(412)521-4431 Cell/Text

hi Don,

thanks a lot for sharing these insights.

Although I get the idea of what you suggest, I am not sure
I would be able to perfectly replicate this analysis.
Do you happen to have a script you could share?

Also can you give the full ref from which the figure is extracted?

thanks again

Best,
Alex

Hi everyone,

this is really an interesting and productive discussion which I enjoy
following very much. And it's especially timely and relevant, since we
plan to support high-level functions for single trial regression in
MNE-Python in the future. Tal Linzen has recently started drafting
examples and first functions, cf.
https://github.com/mne-tools/mne-python/pull/1034.

Since the there were a few open questions for Teon's use case we
decided to postpone adding direct support for projecting beta-maps.
However, if we could use the collective knowledge and experience
distributed over this mailing list, it might be possible to clarify
those questions and add support this use case directly in MNE-Python
rather soon.

My main concern with the beta-map projection approach is that the
noise structure (and the units) as reflected in the noise covariance
computed on raw M/EEG signals don't necessarily match with the noise
structure present in the beta maps. This would mean a model mismatch.
To tackle this issues different options might be possible, such as
using an identity matrix as noise covariance or estimating the noise
covariance on whitened single trials.

Input on both the work-in-progress on MNE-Python and the conceptual
issues is highly appreciated.

Cheers,
Denis

Here's the reference to the paper:
Krieger, Onodipe, Charles, Sclabassi. Real time signal processing in the clinical setting. Annals Biomed Eng 26: 462-472, 1998.

I've forwarded pdf's to Alex and Denis and am happy it to post it to others.
It's too large to post to the list.
It's up on Researchgate too:
https://www.researchgate.net/profile/Don_Krieger/publication/236300079_RTSignalProcgClinical/file/504635179218e4ab24.pdf?origin=publication_detail

Unfortunately, I do not have a script or other software module which is fit for public consumption.
But I do not think that the time and effort to produce them from scratch will be significant for those who are contributing to mne-python.
The original fortran routine ran on a DEC PDP-11 cpu in 64 Kbytes of memory in real time so there shouldn't be a problem with computational cost or memory.

The paper details a set of uses for regression. We routinely used it as a preprocessing step to identify and remove line noise and low frequency signal components. Examples of both are included in the paper along with the algorithm. Prophete Charles was able to prove that using regression in this way was equivalent to an ideal digital filtering operation. But it's better because (1) it requires no warm up or cool down and (2) artifacts can be excluded from the regression to avoid ringing.

Regards,
?
Don
?
Don Krieger, Ph.D.
Department of Neurological Surgery
University of Pittsburgh
(412)648-9654 Office
(412)521-4431 Cell/Text

Hi Denis et al.,
    It appears to me that there are two separate issues being confused
here and perhaps there will be some clarity if we talk through it:

(A) Whether to compute "betas" in sensor-space or source-space:
    This is not really a difficult question within the MNE or other linear
inverse solution framework. Because, for a given inverse operator
(let's call it M), computing betas in sensor or source space should
lead to identical results unless the statistical model being fitted to
compute the betas is somehow non-linear.

(B) Choice of the noise model used to compute the operator M:
    This issue is more subtle and important and does not depend on the
sequence in which you do things in (A). One has to consider this
question regardless of whether he/she is doing stats in sensor-space
or source space. By projecting things to source-space first, one
doesn't become immune to an inappropriate choice of the noise model.
   If the noise covariance employed corresponds to that of single-trials
(eg., if you choose to scale it with nave = 1 because you are
projecting single trial data), the map would be unnecessarily smooth
even if later you combine 100s of trials to show you final "map" of
interest. The prior (the minimum norm part) will be given much more
weight and the data fit will be given lot less weight than is
appropriate. Thus, one should employ a noise-covariance scaling *with
the foresight* of what analysis is planned, especially in the context
of regressions.
   I think there is one special case in the context of regressions/ANOVA
that occurs frequently and is easy to handle. This is the case where
"betas" are computed with the weights that are normalized (these
weights are sometimes called "contrasts" in stats parlance)..
That is if: beta = sum_over_i{w_i * x_i} where x_i is the mean from the
i^th group of trials, and if sum_over_i { w_i**2} == 1, and the x_i's
come from the same number of trials, then the beta values have the same
noise variance as the x_i's. To take the simplest example of this case,
if x_1 and x_2 are evoked responses for conditions 1 and 2 (each with
nave = 100 trials), then if you compute your beta values as the
difference divided by sqrt(2), then scaling the noise-covariance by nave
= 100 is appropriate when computing M. In practice, if all the conditions
that go into the stats have the same number of trials, and if the stats
employs orthonormal contrasts, then the default MNE scaling of nave =
#trials works well..

Hope this helps..

Hari

Hari, I understand your discussion of categorical independent variable with
several discrete levels (conditions), but how would you extend this logic
to a continuous correlation design, where each trial has its own individual
value of the independent variable? For simplicity, let's assume that we're
trying to model the effect of trial order on neural adaption, and you want
to use the actual trial number (not just bin your trials into "early" and
"late").

Tal

Hi Tal,
   At least in the simplest case when you fitting just one term, as long
as the beta value is still a linear combination of independent trials
with weights that are normalized, it doesn't matter that the covariate
in that term is continuous or categorical.. For instance, in your
example, if you treat trial number as continuous, if you encode your
trial numbers so that over trials the mean value of the trial number
is zero and in addition normalize so that the sum of squares is 1, then
the beta values have the same noise distribution as the noise in each
trial..

Things get more complex when you have multiple terms and would depend on
the specific design. However depending on what the model is, if you write
out the model and see what the precise computation is that estimates the
beta value of interest, it may be possible to analytically calculate what
the scaling for the noise covariance should be.. Of course if the trials
going into the computation stop being treatable as independent (which can
happen in many repeated-measures type cases), then things could get
intractably tricky very quickly...

HTH..

Hari

Hi everyone,

First with regards the continuing discussion of regression models which include both scaled and categorical variables, etc.: There are general purpose open sources statistical packages, e.g. R, which implement ANOVA, ANCOVA, MANCOVA, etc. in the general linear model format, i.e. the one we are discussing. It therefore might be worth evaluating one or more of them for development of python wrappers which use them to do the regressions we are discussing.

Second with regards both the equivalent variance discussion and the assertion of the equivalence of computing the Betas in sensor or source space: Perhaps I am misunderstanding. I had just assumed that the regression on an independent variable was to be for one sensor at a time. Thus in the statistical framework the problem could be handled as a MANOVA with the sensor measures being multiple dependent variables or as a repeated measures ANOVA where the sensor measures are repeated measures. In neither case though would regression on the independent variable against a source be conceptually equivalent or give the same answer. i.e. against a linear combination of the sensors,

For the MANOVA approach, the regression against each sensor is handled separately. So I think that at least the differences in variances across the sensors don't matter. What do you think? I don't write with the voice of authority here and would like to understand this.

Don

Don Krieger, Ph.D.
Department of Neurological Surgery
University of Pittsburgh
(412)648-9654 Office
(412)521-4431 Cell/Text

Hi Donald,
    Just to clarify, I did not intend to suggest that the betas in sensor and source would be equal or equivalent... Apologies for any confusion caused.. What I meant in (A) below, was that the following would give identical results if the regression is linear:
(1) computing beta values on sensor data (1 sensor at a time separately ) and computing the inverse solution with these beta values to get "source beta" values..
(2) computing inverse solution of the original MEG data and then computing "source beta" values directly by fitting the regression (1 source at a time)...The "source betas" in the two cases are what I think will be identical.

Regarding the noise variance, the discussion that I articulated was with the problem of source estimation in mind. Hence it was about specifying the noise covariance matrix for MNE (which was Denis's concern, I think) rather than about statistical inference on the betas. The assertion is that if the regression/ANOVA model is simple enough such that the sensor betas are linear combinations across trials (or trial groups) AND if the weights for the linear combination are normalized, then the noise covariance in the betas is the same as the noise covariance of the original trials (or trial groups) that went into its calculation (and hence the originally planned scaling of noise covariance for MNE purposes would still be appropriate). I too, only thought about the regression/ANOVA as being done on one sensor at a time or one source at a time... Even with this relatively simple "mass univariate" approach, if the model being fit is complex or has many terms the relationship between the variance of the noise in the beta values and the variance of the noise in original MEG/EEG data gets complicated and inverse estimation could suffer.

How to do parametric inference in source or sensor space by correctly accounting for across sensor or across source variance inhomogeneities is a whole issue on its own and is likely one with many complexities. If I understand correctly, that is the question you have posed?

Best,
Hari

Hari Bharadwaj
PhD Candidate, Biomedical Engineering,
Auditory Neuroscience Laboratory
Boston University, Boston, MA 02215

Martinos Center for Biomedical Imaging,
Massachusetts General Hospital
Charlestown, MA 02129

hari at nmr.mgh.harvard.edu
Ph: 734-883-5954

Thanks, Hari, for posting back and making it clear.
I agree with your points to the extent that I follow them.
I am pretty hazy about the effects of differing variances across the sensor measurements or the Betas computed from them on the MNE projections into the source space.
Because of that and because the idea of applying regression across single trials on an independent task relevant variable is quite interesting and, I think promising, I had been thinking about applying it in the source space. If that works for a simple ANOVA, the means would represent mean neuroelectric currents and a statistical finding of a difference between two task conditions would imply that that source was differentially activated under the two conditions. That's what fMRI is about for neurovascular activity but with MEG, differential neuroelectric activation is much closer to what we conceive the brain is doing to accomplish the task.

Our group has had success in generating 3D maps of differential activation using an even simpler chi-square test on a counting measure. Our first results just appeared in an open access journal: Intl J Advd Comp Sci (4)1: 15-25, 2014. But parametric statistics applied to dipole amplitude estimates has far greater potential statistical power. This discussion has spurred me to consider how to use that approach to amplify the power of those results.

Thanks again.

Regards,
?
Don
?
Don Krieger, Ph.D.
Department of Neurological Surgery
University of Pittsburgh
(412)648-9654 Office
(412)521-4431 Cell/Text

Hi,

Thank you Don, Hari and everyone else for your input and insight to my
question. I think it made for a very productive discussion and I now have a
better understanding of the things to consider when doing this.

Best,
Teon