Hello,
I’m attempting to create a classifier on hourly EEG data to predict the neurological outcome of coma patients. I am using only 1 channel of data for this
The EEG data is preprocessed as follows:
- A notch filter is applied if the utility frequency is between bandpass frequencies (range of [0.1 Hz, 30 Hz])
- Data is passed through the bandpass filter
- Data is resampled using scipy.signal.resample_poly()
- Data is rescaled to the interval [-1, 1]
I noticed that the DCT, when applied after the 4th step, produces a very large first coefficient the moment I increase the bandpass lower frequency cutoff (high pass filter) above 0. The resulting DC coefficient is anywhere in the range of [-100, 300] depending on the hour the data is taken.
When I apply DCT after steps 1, 2, and 3, I do not run into this abnormally large first coefficient. I have included graphs of the preprocessing below (in this example, the notch filter step was not necessary).
The first graph is when the bandpass filter range is 0 Hz to 30 Hz.
The second graph is when the range is 0.1 Hz to 30 Hz.
I was wondering if anyone might understand why this is happening, what this means, and if normalizing the data between -1 and 1 or applying DCT to normalised data is problematic in these circumstances for some reason.
EDIT: I also haven’t seen EEG data normalised in this interval in scientific literature before, at least not in ML-related EEG research. I am wondering if it is beneficial to omit normalisation altogether.