[Freesurfer] question about how to calculate COV for MNE

Hi Joe,

I have forwarded your email to the mne_analysis list (please
subscribe). This is the list for MNE questions. Does anyone have the
answers for Joe?

HTH,
D

hi joe,

   I have a question about how to calculate the COV for the MNE software.
As I understand it from Section 4.17 of the 2.7.3 MNE manual, if one is
averaging together COV matrices, one weights them by the number of
observations going into each one. I also see from Section 4.17.2 that these
are technically not covariance matrices so much as sum-of-squares matrices
where the epochs going into each variable has been baseline corrected.

   Assuming my understanding so far is correct, my question is about how to
proceed when making linear combinations of COV matrices, as in a difference
wave, as discussed in Section 6.3. Would the following be the correct
procedure?

   Compute the COV separately for the two conditions (say standard and rare
oddballs). Subtract the two COV matrices to obtain C0.

Subtracting two COV matrices is a bad idea. You can make the new cov
non positive.

basically the noise cov is just to estimate the noise so any sample you use
to get a better estimate the better. There is no condition or contrast
at this stage.

for source reconstruction you should use the same noise cov for the two
conditions to have the same inverse operator.

you can look at this file to know more about the internals:

HTH
Alex

Fair enough! Just trying to understand what Section 6.3 is trying to communicate. So let me try another possible interpretation:

Compute the COV for the two conditions. When subtracting the two conditions, the accompanying COV matrix (same for both conditions since they were pooled when computing it) is divided by Leff. Likewise the accompanying degrees of freedom (used if it is subsequently averaged together with another COV matrix per Section 4.17) is set equal to Leff.

Not saying this is a procedure that makes sense to me either, just asking what the intent is of the MNE developers. Based on your response, this is currently my best guess, although it doesn?t make sense to me either since dividing the COV matrix by a single scalar (however computed) shouldn?t affect computations about relative noise levels in the channels. Basically, I?m not sure what is being done with COV so it?s hard for me to judge what is a reasonable interpretation of Section 6.3.

I appreciate the link to the code, but since I?m a Matlab programmer not a C++ programmer, it?s very hard for me to parse the code. If someone could just tell me what is to be done according to 6.3 when computing a difference wave rather than just what not to do, this would be greatly appreciated. I?m trying to add support for FIFF file format to my Matlab EEG software analysis suite (http://sourceforge.net/projects/erppcatoolkit/) and want to make sure I?m doing it in a manner that would be acceptable to the MNE developers, for the benefit of the users of my package.

Thanks for taking the time to read my question!

Joe

I should say Python, not C++?

Compute the COV for the two conditions. When subtracting the two
conditions, the accompanying COV matrix (same for both conditions since they
were pooled when computing it) is divided by Leff. Likewise the
accompanying degrees of freedom (used if it is subsequently averaged
together with another COV matrix per Section 4.17) is set equal to Leff.

this is correct.

Not saying this is a procedure that makes sense to me either, just asking
what the intent is of the MNE developers. Based on your response, this is
currently my best guess, although it doesn't make sense to me either since
dividing the COV matrix by a single scalar (however computed) shouldn't
affect computations about relative noise levels in the channels. Basically,
I'm not sure what is being done with COV so it's hard for me to judge what
is a reasonable interpretation of Section 6.3.

if baseline is pure noise then when you compare conditions by subtraction
you reduce the variance of the baseline, hence the scaling.

does it help?

Alex

Hi Joe,
    Thought I'll mention a couple of things that might help understand the scaling etc.

(1) the noise cov that is computed and stored in the -cov.fif files corresponds to the covariance of the measurement noise in the *raw* data and hence when using it to localize multitrial *averaged* evoked responses, it is scaled by the number of trials in the average (based on the assumption that noise is independent from trial to trial and hence scales that way).. Look for the "nave" input when generating "stc" files.

(2) whether you add or subtract two *average* evoked responses, the resulting noise distribution is the same (i.e., still mean zero and variance depending on the numbers of trials in each condition).. More generally when you take an arbitrary linear combination of two or more averaged responses, the mean is still zero for the noise and the variance scales depending of the weights for each condition and numbers of trials that went into the corresponding averages.

(3) based on (2), considering the common special case of a difference response of two (separately averaged) conditions, if the number of trials in each of the two conditions that went in were not the same and are instead L1 and L2 respectively, then the effective scaling of the noise (from raw to the difference waveform) variance is what is calculated as Leff.

HTH,
Hari

Hari Bharadwaj
PhD Candidate, Biomedical Engineering,
Auditory Neuroscience Laboratory
Boston University, Boston, MA 02215

Martinos Center for Biomedical Imaging,
Massachusetts General Hospital
Charlestown, MA 02129

hari at nmr.mgh.harvard.edu
Ph: 734-883-5954

Ah, very good! Also, I just noticed references to whitening in the python code you linked me to, so now I understand what is being done with this information. Thanks!

Joe

This is helpful, thanks!

Some further thoughts and questions:

1) Given how whitening is conducted, the reference choice would be critical. I want to make sure that when my software writes out the COV information to the -cov.fif file, that it is doing it correctly. How is referencing being handled? For example, is the COV info always average referenced even if the data accompanying it has not been average referenced yet? Or is it always in the current referencing scheme with the reference info sent along separately so it can be taken into account later? Or what?

2) I also see part of what was confusing me. It sounded as though 4.17 was for when one is averaging data together whereas 6.3 is for when one is subtracting or summing. So when combining subject averages together into a grand average, one would use the 4.17 procedure, which quite rightly made no sense to me. Instead, 4.17 is intended for the situation where ideally only noise is present and the goal is to just weight the estimates of noise according to their quality (the number of observations used to generate them) when combining them, which in the case of two perfect COV estimates of noise will not change when combined together (still perfect).

So I understand 4.17 now. I am still wrestling with how 6.3 is to be applied. My understanding is that when one adds or subtracts two epochs with nothing but random noise, the noise level does not change. If anything the noise level (variance over the epoch) increases as I just demonstrated to myself with an Excel spreadsheet. The noise level improvement only occurs when one takes the average of the epochs and is well known in the EEG field to occur following a square root of the number of observations involved.

So what situation is 6.3 intended for? It seems to indicate that if one takes a difference wave (say ERP to incongruent word minus congruent word) then one should calculate Leff as described and then obtain the new COV by dividing the original COV (C-zero) by Leff. Given the previous paragraph, this procedure doesn?t make sense to me since the noise level (as measured by COV) should not be decreasing. So I?m clearly still missing something.

From a practical perspective, I?m asking because I want to make sure my software generates the correct COV for difference waves.

Respectfully,

Joe

hi Joe,

1) Given how whitening is conducted, the reference choice would be critical.
I want to make sure that when my software writes out the COV information to
the -cov.fif file, that it is doing it correctly. How is referencing being
handled? For example, is the COV info always average referenced even if the
data accompanying it has not been average referenced yet? Or is it always
in the current referencing scheme with the reference info sent along
separately so it can be taken into account later? Or what?

MNE works only with averaged reference EEG. The average ref can be
computed with a PCA/SSP proj so it's transparent in the MNE pipeline.

I'll let somebody else answer for the rest or try to find some time later.

Alex

Hi Joe,

I don't have the direct answer to your question; however, typically
MNE users do not subtract sensor data from one another (for several
reasons one of which is subtractions increase the noise level
resulting in poorer inverse estimates). The typical analysis procedure
would be to invert each condition then perform a statistical
comparison (for example permutation test) between the two conditions
one wishes to compare.

I hope this helps.
D

Hi Joe,
   I'm venturing some answers inline..
Hope these help.

Hari

This is helpful, thanks!

Some further thoughts and questions:

1) Given how whitening is conducted, the reference choice would be
critical. I want to make sure that when my software writes out the COV
information to the -cov.fif file, that it is doing it correctly. How is
referencing being handled? For example, is the COV info always average
referenced even if the data accompanying it has not been average
referenced yet? Or is it always in the current referencing scheme with
the reference info sent along separately so it can be taken into account
later? Or what?

Like Alex said MNE enforces the use of average reference for source
estimation. Average reference has some advantages (at least in theory)
compared to an alternate reference (like linked ears) when it is likely
that the forward model estimation will be inaccurate at the reference
electrode of choice. With average reference, the forward calculation at
the reference electrode becomes irrelevant.

2) I also see part of what was confusing me. It sounded as though 4.17
was for when one is averaging data together whereas 6.3 is for when one is
subtracting or summing. So when combining subject averages together into
a grand average, one would use the 4.17 procedure, which quite rightly
made no sense to me. Instead, 4.17 is intended for the situation where
ideally only noise is present and the goal is to just weight the estimates
of noise according to their quality (the number of observations used to
generate them) when combining them, which in the case of two perfect COV
estimates of noise will not change when combined together (still perfect).

So I understand 4.17 now. I am still wrestling with how 6.3 is to be
applied. My understanding is that when one adds or subtracts two epochs
with nothing but random noise, the noise level does not change. If
anything the noise level (variance over the epoch) increases as I just
demonstrated to myself with an Excel spreadsheet. The noise level
improvement only occurs when one takes the average of the epochs and is
well known in the EEG field to occur following a square root of the number
of observations involved.

Section 4.17 is about estimating the covariance matrix of the noise in the
*raw* (unaveraged/single-trial) data... When you combine multiple
estimates, you just get a better estimate but it still corresponds to the
cov of the noise in the *raw* data. You are not changing the noise levels
in the procedures described in this sections, just that you are improving
estimates of what the noise is, with the level still corresponding to raw
data

So what situation is 6.3 intended for? It seems to indicate that if one
takes a difference wave (say ERP to incongruent word minus congruent word)
then one should calculate Leff as described and then obtain the new COV by
dividing the original COV (C-zero) by Leff. Given the previous paragraph,
this procedure doesn?t make sense to me since the noise level (as measured
by COV) should not be decreasing. So I?m clearly still missing something.

From a practical perspective, I?m asking because I want to make sure my
software generates the correct COV for difference waves.

The considerations in chapter 6 on the other hand come into play when you
are using a given noise-cov (which always corresponds to raw noise levels)
with your choice of evoked (multitrial averaged) data to compute an
inverse. Since the noise in the evoked data is smaller than the noise in
the raw data (by a factor of #trials), the noise cov matrix is scaled down
before the STC files are generated from it. By default, the scaling here
is the number of trials mentioned in the evoked data file.

Now, to your main question: If you have manipulated your evoked responses
in complex ways that the software cannot keep track of (For instance
subtracting two evoked responses (corresponding to say congruent and
incongrunt words), then it is up to you to specify how to the scale the
noise cov from raw data before calculating the inverse solution (STC file
typically). Otherwise the software will use the default scaling from the
nave field in the header of the evoked fif file that you are supplying to
it. You can specify the scaling by setting the "nave" parameter either in
mne-python or when using the command-line tools the "--nave" for
"mne_make_movie". You don't have to scale the noise-cov files by hand..
They should continue to correspond to raw data. Instead you just specify
the scaling (which is chosen based on what evoked data you supply) when
you come to the stage where you use the noise-cov to generate inverse
estimates.

Just to take an example and talk through the whole process, if you have
two conditions (corresponding to say congruent and incongrunt words), with
50 and 50 trials each and lets say you are interested in localizing the
difference wave (instead of localizing them separately and doing stats
comparing them in source space). When you do the difference, the effective
scaling of the noise you need is:

L_eff = 50*50/(50 + 50) = 25.

So when using mne_make_movie (or the corresponding mne-python function),
you specify explicitly that nave = 25.

Note that this confirms that when you subtract two ERPs, the noise
actually doubles in variance (or increases by a factor of sqrt(2) in
standard deviation).. i.e., instead of dividing by 50 which would have
happened if you localized *either* of those conditions, because you
subtracted them, you only divide by 25.

Hi Joe,
  I'm venturing some answers inline..
Hope these help.

Hari

This is helpful, thanks!

Some further thoughts and questions:

1) Given how whitening is conducted, the reference choice would be
critical. I want to make sure that when my software writes out the COV
information to the -cov.fif file, that it is doing it correctly. How is
referencing being handled? For example, is the COV info always average
referenced even if the data accompanying it has not been average
referenced yet? Or is it always in the current referencing scheme with
the reference info sent along separately so it can be taken into account
later? Or what?

Like Alex said MNE enforces the use of average reference for source
estimation. Average reference has some advantages (at least in theory)
compared to an alternate reference (like linked ears) when it is likely
that the forward model estimation will be inaccurate at the reference
electrode of choice. With average reference, the forward calculation at
the reference electrode becomes irrelevant.

I definitely agree about the need to use average reference for source modeling, at least for ERPs. I wasn?t quite sure how it worked in the software, especially with regards to projections, which is an aspect of the file format that doesn?t seem to be documented except at the general conceptual level.

Working backwards from the sample dataset, what I see is that in the chs field of the sample_audvis_raw.fif file, there are 60 EEG channels plus a common implicit reference electrode (implicit in that its coordinates are provided as the reference site for each of the EEG channels but it does not have its own entry nor does the data have a waveform for it, which by definition would be a flat line of zero voltage). Further, I see that the cov matrix obtained when running the audvis.cov script has only entries for the 60 EEG electrodes but again not the reference site. This tells me that the cov matrix is based on the raw data and is not in average reference form, otherwise information would be lost by not including the reference electrode in the cov matrix. Presumably the MNE software whitens the raw data directly using the raw cov matrix and only afterwards converts to average reference using the appropriate projection.

2) I also see part of what was confusing me. It sounded as though 4.17
was for when one is averaging data together whereas 6.3 is for when one is
subtracting or summing. So when combining subject averages together into
a grand average, one would use the 4.17 procedure, which quite rightly
made no sense to me. Instead, 4.17 is intended for the situation where
ideally only noise is present and the goal is to just weight the estimates
of noise according to their quality (the number of observations used to
generate them) when combining them, which in the case of two perfect COV
estimates of noise will not change when combined together (still perfect).

So I understand 4.17 now. I am still wrestling with how 6.3 is to be
applied. My understanding is that when one adds or subtracts two epochs
with nothing but random noise, the noise level does not change. If
anything the noise level (variance over the epoch) increases as I just
demonstrated to myself with an Excel spreadsheet. The noise level
improvement only occurs when one takes the average of the epochs and is
well known in the EEG field to occur following a square root of the number
of observations involved.

Section 4.17 is about estimating the covariance matrix of the noise in the
*raw* (unaveraged/single-trial) data... When you combine multiple
estimates, you just get a better estimate but it still corresponds to the
cov of the noise in the *raw* data. You are not changing the noise levels
in the procedures described in this sections, just that you are improving
estimates of what the noise is, with the level still corresponding to raw
data

So what situation is 6.3 intended for? It seems to indicate that if one
takes a difference wave (say ERP to incongruent word minus congruent word)
then one should calculate Leff as described and then obtain the new COV by
dividing the original COV (C-zero) by Leff. Given the previous paragraph,
this procedure doesn?t make sense to me since the noise level (as measured
by COV) should not be decreasing. So I?m clearly still missing something.

From a practical perspective, I?m asking because I want to make sure my
software generates the correct COV for difference waves.

The considerations in chapter 6 on the other hand come into play when you
are using a given noise-cov (which always corresponds to raw noise levels)
with your choice of evoked (multitrial averaged) data to compute an
inverse. Since the noise in the evoked data is smaller than the noise in
the raw data (by a factor of #trials), the noise cov matrix is scaled down
before the STC files are generated from it. By default, the scaling here
is the number of trials mentioned in the evoked data file.

Now, to your main question: If you have manipulated your evoked responses
in complex ways that the software cannot keep track of (For instance
subtracting two evoked responses (corresponding to say congruent and
incongrunt words), then it is up to you to specify how to the scale the
noise cov from raw data before calculating the inverse solution (STC file
typically). Otherwise the software will use the default scaling from the
nave field in the header of the evoked fif file that you are supplying to
it. You can specify the scaling by setting the "nave" parameter either in
mne-python or when using the command-line tools the "--nave" for
"mne_make_movie". You don't have to scale the noise-cov files by hand..
They should continue to correspond to raw data. Instead you just specify
the scaling (which is chosen based on what evoked data you supply) when
you come to the stage where you use the noise-cov to generate inverse
estimates.

Just to take an example and talk through the whole process, if you have
two conditions (corresponding to say congruent and incongrunt words), with
50 and 50 trials each and lets say you are interested in localizing the
difference wave (instead of localizing them separately and doing stats
comparing them in source space). When you do the difference, the effective
scaling of the noise you need is:

L_eff = 50*50/(50 + 50) = 25.

So when using mne_make_movie (or the corresponding mne-python function),
you specify explicitly that nave = 25.

Note that this confirms that when you subtract two ERPs, the noise
actually doubles in variance (or increases by a factor of sqrt(2) in
standard deviation).. i.e., instead of dividing by 50 which would have
happened if you localized *either* of those conditions, because you
subtracted them, you only divide by 25.

Thanks for the explanation. It definitely helps me to know how the data files are intended to be used. It sounds like the simplest solution is to set the nave field to L_eff and the MNE software will use it to make the appropriate scaling. I don?t need to be modifying the COV matrix itself when generating difference waves. I also don?t need to be changing the nfree field of the COV data. This also means I should add a field to my data format that keeps track of the L_eff scores so it will be available to insert into the nave field when the data is saved in FIFF format.

I think I have everything I need now to implement the preliminary file I/O code. Then I?ll verify by comparing the results to that using just MNE software.

I?m still not clear whether I need to make sure that an average reference projection is included with the cov matrix and/or the data file so I?ll need to work on that bit some more.

Thanks for taking the time to help me puzzle all this out!

Joe

hi,

I definitely agree about the need to use average reference for source
modeling, at least for ERPs. I wasn?t quite sure how it worked in the
software, especially with regards to projections, which is an aspect of the
file format that doesn?t seem to be documented except at the general
conceptual level.

Working backwards from the sample dataset, what I see is that in the chs
field of the sample_audvis_raw.fif file, there are 60 EEG channels plus a
common implicit reference electrode (implicit in that its coordinates are
provided as the reference site for each of the EEG channels but it does not
have its own entry nor does the data have a waveform for it, which by
definition would be a flat line of zero voltage).

this channel is meant to be able to undo the referencing.

Further, I see that the
cov matrix obtained when running the audvis.cov script has only entries for
the 60 EEG electrodes but again not the reference site. This tells me that
the cov matrix is based on the raw data and is not in average reference
form, otherwise information would be lost by not including the reference
electrode in the cov matrix. Presumably the MNE software whitens the raw
data directly using the raw cov matrix and only afterwards converts to
average reference using the appropriate projection.

first whiten and then average reference works since the average ref
is a PCA/SSP with a vector of ones. We can therefore apply it at any stage.
Even multiple times.

Thanks for the explanation. It definitely helps me to know how the data
files are intended to be used. It sounds like the simplest solution is to
set the nave field to L_eff and the MNE software will use it to make the
appropriate scaling. I don?t need to be modifying the COV matrix itself
when generating difference waves. I also don?t need to be changing the nfree
field of the COV data. This also means I should add a field to my data
format that keeps track of the L_eff scores so it will be available to
insert into the nave field when the data is saved in FIFF format.

I think I have everything I need now to implement the preliminary file I/O
code. Then I?ll verify by comparing the results to that using just MNE
software.

I?m still not clear whether I need to make sure that an average reference
projection is included with the cov matrix and/or the data file so I?ll need
to work on that bit some more.

see the projs in the measurement info .

Thanks for taking the time to help me puzzle all this out!

no pb

Alex