My apologies if the documentation exists somewhere and I am just unable to find it, but what do the start/step parameters when using TFCE explicitly control? I couldn't find much in the manual or the archives.
Hi Cody, it might help to read this paper on TFCE:
Smith, S. M., & Nichols, T. E. (2009). Threshold-free cluster enhancement:
Addressing problems of smoothing, threshold dependence and localisation in
cluster inference. *NeuroImage*, *44*(1), 83?98. http://doi.org/10.1016/j.neuroimage.2008.03.061
The definition of TFCE in equation (1) is an integral, with range "start"
to the height at a given point; in software, this is simply approximated as
a sum, so you choose the "start" (typically 0 makes sense) and then the
step for how many discrete steps to choose to approximate the integral as a
sum. I think the paper is very helpful for getting the intuition.
Aha! That's perfect. Thank you. I had actually been trying to figure it out from that very paper, I just had failed to make the connection between start/step and the integral. Thanks so much.
Just to add on to this, one of the old emails from the archive I found on the topic stated that a good default place to set start & step at is (start=2.0, step=0.2). Is there any motivation behind start>0 other than trying to reduce computation time? Altering that changes the resultant p values rather drastically in some cases I've noticed. The source paper lists the start as typically being 0 without stating any reason to go above that. So I'm just curious what the motivation behind this is? Why are these the 2 parameters we have control for TFCE? What's to be gained/lost?
Is there any motivation behind start>0 other than trying to reduce
computation time?
Not as far as I know.
Why are these the 2 parameters we have control for TFCE? What's to be
gained/lost?
Think of it as a way to approximate an integral where each function value
to put into the summation takes a long time to compute (the clustering
step). Ideally we would start at zero and go in infinitesimal steps to the
largest statistic value, but numerically that's infeasible and practically
it would take forever. So the idea is to compute as few as possible
(highest start and biggest step) without affecting the result. If using a
smaller start and/or step affects the output, then you should use the
smaller start and/or step because it should provide a better approximation
to the integral. Without looking back, I would assume a start of 2 was
suggested because it usually doesn't affect the result (at least in the
suggester's experience).
Ah, alright beautiful. Thanks for the explanation Eric. I'm imagining using a start of 2 on a whole brain analysis would not make much of a difference, which I'm imagining is where that number was coming from.
Thanks for the discussion, I am using TFCE myself and it is helpful to see
what others think about it. It seems safest to me to always start at 0 and
increase the step size to ease computational burden, as changing the start
value seems much more likely to have drastic effects on the overall result
than changing the step size.
Also, sorry to be a bit pedantic, but while I completely understand wanting
to play with the method to see how the parameters might affect the results,
please keep in mind that going back and changing the start/step values to
get different (or possibly 'more significant') p-values is problematic. To
me, one of the main benefits of using TFCE is to avoid having to choose a
single threshold ahead of time, as trying several thresholds and picking
the 'best one' results in a new multiple comparisons issue that is
typically not accounted for.
Oh, definitely. It seems like unless you have serious computational needs that using 0 is the best way to go (and certainly the only place to start if you want to know your results are legit). My confusion stemmed from not understanding the parameters from the scarce documentation and taking that "good default" of start=2 and step=.2 that I stumbled upon in the archives and then stepping back to 0 once I learned what these parameters were actually controlling. I was exploring in the context of single source phase-locking plots, so that made quite a big difference. In the specific context of mne-python's use of TFCE, the smaller each of those parameters, the more accurate the result, so it seems any kind of adjustment up in pursuit of significance would be a huge unsound no-no since these numbers are very non-arbitrary. I didn't mean to imply I would be tweaking those values to find what p-value I liked best; I just wanted to understand what the tweaking was doing.
You have hit on a huge problem in our current MCP-concerned landscape in which all these solutions have parameters that are arbitrary and we do just function on the trust of the experimenter to responsibly use those (since I'm yet to see a paper that Bonferroni corrected parameter tweaks). Everybody picks exactly the right settings the first time right? I would say an important step in learning these things is to play with fake data that's self-constructed. That's the only way to truly know where the signal is (since you put it there) and then learn how to extract the signal you are looking for using whatever new method Nichols releases on the neuroimaging world.
I have polysomnographic data and try to correct for eye movement
artifacts. I have two EOG channels, two electrodes were placed at the
outer edges of the eyes, one slightly above and the other slightly
below the eye, respectively. Also, I have 6 EEG electrodes (F3, F4,
C3, C4, O1, O2).
Is there a simple approach or a command to correct for saccadic and
slow eye movements in continuous EEG data using MNE?
Just wanted to say I did not at all mean to accuse you of doing anything
nefarious I just thought in case others are reading this, now or later
on, it was worth mentioning. Agreed that fake data (or even other real data
not relevant to the analysis, or a data-split set that won't be included in
the final analysis) is a good way to go.
I am using MNE to analyse some MEG mismatch negativity (MMNm) data. I have ~170 trials of a ?standard? and ~170 trials of a ?deviant?, the MMN comes from the average of the deviant minus the average of the standard for each participant.
I have some data that looks insensitive to the deviant response, but would like an objective way of identifying and removing the data from my analyses. I?m sure its out there, but is there a function/s to test whether a section of the averaged deviant time series from one channel (MEG1621 from 10oms to 200ms) is significantly greater than zero? The data is baseline corrected, so I presume against zero would be the correct comparison.
Also, I just wanted to let you know that subtracting one average() from another results in the average of the two rather than one minus the other. For example: dev_ave - stand_ave = (dev_ave - stand_ave)/2. I?m not sure if this is intentional, or an error in the code for average()?
is there a function/s to test whether a section of the averaged deviant
time series from one channel (MEG1621 from 10oms to 200ms) is significantly
greater than zero? The data is baseline corrected, so I presume against
zero would be the correct comparison.
Can you take a look at the statistics examples and tutorials on the website
to see if any of them cover your use case? If not, we should probably add
one.
Also, I just wanted to let you know that subtracting one average() from
another results in the average of the two rather than one minus the other.
For example: dev_ave - stand_ave = (dev_ave - stand_ave)/2. I?m not sure if
this is intentional, or an error in the code for average()?
This has come up multiple times before, so we are actually planning to
change this behavior in the next MNE-Python release (0.13):
For now, you can use `mne.combine_evoked` (with weights=[1, -1]) to get the
plain subtraction you want.
Hi Eric,
Thanks, I and my colleagues have had a good look through them and none seem to do what we need. The closest is http://mne-tools.github.io/stable/auto_tutorials/plot_stats_cluster_1samp_test_time_frequency.html though it looks at the TF. I have calculated the t stat for each time point from the evoked (with the standard deviations of the averaged evoked data) and plotted it on the time series which gives me a pretty good idea. Perhaps the request is a little specific to my data, although I would be happy to share what I have come up with.
Thank you anyway!