Question 2: Is mne.time_frequency.csd_multitaper a energy spectral density or a power spectral density? Looking through the code, I don’t see a normalization by the time duration which makes me think it’s an energy spectral density, but I also could be missing it. (Or am just generally confused… very possible!)
Question 3: The density bit of cross-spectral density doesn’t imply there’s a normalization by dividing by total energy or something like that, right? Does it mean anything in particular here? (Sorry, kind of a general signal processing question rather than MNE specific).
For context, I’m helping implement some multitaper coherence and CSD code in the open source library DSP.jl, written in Julia (specifically in https://github.com/JuliaDSP/DSP.jl/pull/401), partially based on mne.time_frequency.csd_multitaper.
I think the answer is it is a power spectral density, and the normalization is actually coming from the tapering. (@palday said this when I asked him but I didn’t really get it ). Let me write it out as I understand it.
The tapers have 2-norm equal to 1, i.e. \sum_t |h_t|^2 = 1 if h_t is the t th component of the taper. If our sequence is of length N and we chose h_t = \frac{1}{\sqrt{N}}, then again we would have 2-norm equal to one. Let Y_t = X_t h_t denote the tapered sequence. Then it’s FFT has \mathcal{F}(Y)(f) = \frac{1}{\sqrt{N}} \mathcal{F}(X)(f) for each frequency f since the Fourier transfrom is linear and our h_t is a constant so we can pull it out. Then when we take the absolute value squared to get the power, we’d have |\mathcal{F}(Y)(f)|^2 = \frac{1}{N} |\mathcal{F}(X)(f)|^2 and we have our normalization. So we see a particular case of tapering by a sequence with 2-norm equal to 1 recovers the normalization constant when we take the power. The idea then is in general, tapering by such a sequence is effectively performing such a normalization for us.
The CSD object computed with a multitaper is not normalized. However, it does have an n_fft field in order to do normalization at some later stage. So, a CSD computed on 100 epochs has smaller values than a CSD computed on 200 epochs. Any function that operates on an CSD will in practrice normalize by dividing the values by n_fft.