Preprocessing long EEG recordings

Hello MNE Forum! :smile:

FAIR WARNING
VERY green in this world, I apologize in advance for any dumb questions or reasoning.
However, VERY humble and eager to learn. I don’t have resources to take paid classes but I do have time to read, watch, learn whatever you throw at me.
If the answers are right under my nose I’m sorry to have bothered you all and grateful for any pointers in the right direction! :pray:

(background)
My name is Maria and I’m a medical student from Sweden spending the summer and autumn doing research about EEG-patterns during sleep. I have no prior knowledge about coding, python etc.

I’ve started out by:

  1. Creating an account on GitHub, installing Git and anaconda navigator according to some instructions from neuraldatascience.org
  2. Learning some absolute basic Python language by trial and error and Youtube
  3. Digging through the MNE Tools tutorials and experimenting with my data in Spyder

(data)
Now, the data I want to analyse is in sets of

  • 19 EEG channels in a classic 1020 montage
  • 4 EOG channels
  • 1 ECG channel
  • 4 EMG channels

With a sample frequency of 256 Hz.

Each recording represent a ~ 20-40 min nap with a few minutes awake in the beginning of every recording. While awake, the subject is instructed to do an LRLR eye movement, which they are further instructed to repeat during sleep in the occurrence of a Lucid Dream.

(idea)
I want to study the patterns of different wake & sleep stages, probably focusing extra on REM-sleep and if REM-sleep patterns differ when in a Lucid Dream. I’ve found the Multitaper Toolbox from Prerau lab, which seems like a good way to visualize the data in an easily comprehensible way, and i’ve managed to plot these spectrums from some of my data.

(problem/question)
Now, so far I have done very little preprocessing. Setting up a bandpass filter, rereferencing and removing a bad channel or two is no problem, but as soon as I start with the ICA I feel like I mess it up, I can’t really grasp what I’m reading or interpret the plots and the figures and choose wisely which components to reject. I think I can indentify artifacts from the eyes, but if I remove them, is that enough? Are all the other ICs also “disturbance” of different kinds, so I could “kill all the noise/artifacts” by removing all the ICs, or would that just remove the whole signal because the ICs represent all the components of the signal, brain activity included?

Most of the methods do the work for me, which obviously is really nice and handy, but I wanna make sure I understand what I’m doing so I don’t just end up corrupting the data and then analyse things that aren’t there or in other ways interpret it wrongly. So I turn to the tutorials and the forum, but I find it hard to sort out what is relevant in my specific case, since a lot of the information I find seem to focus on and give examples with MEG and on epoched data, while mine is EEG and continuous.

So I guess my questions are something like:

  1. Have I explained my project enough for you to get an idea of what I want do to, or what other information would you like me to add?

  2. Which steps of preprocessing should I include?

  3. In terms of ICA I gather there are 3 different algorithms (fastICA, Picard and Infomax?), so far I’ve just run it with default parameters, but I think I read somewhere that Picard might be preferable to fastICA when working with EEG? Is that correct? If so - in what way?

  4. Should I epoch the data at some point during the preprocessing-process, ie to be able to reject epochs with super-high-voltage-spikes or periods with lots of movement in the beginning of the recording for example? And if so, should I… convert it back(?) to continuous before the analysis? And if so… how?

  5. Does it seem like a reasonable way to do this analysis all together or is there some big obvious flaw that just isn’t obvious to me based on my very minimal background knowledge in this kind of work? In that case, do you have any advice for me?

I think that’s it for a start!
If you read all the way here - thank you!!

I appreciate any advise, feedback or input of any kind :pray:

Output of "python -c “import mne; mne.sys_info()” :

Platform             Windows-11-10.0.22631-SP0
Python               3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:04:44) [MSC v.1940 64 bit (AMD64)]
Executable           C:\Users\chokl\anaconda3\envs\mne\python.exe
CPU                  AMD64 Family 25 Model 80 Stepping 0, AuthenticAMD (16 cores)
Memory               13.9 GB

Core
├☑ mne               1.7.1 (latest release)
├☑ numpy             1.26.4 (OpenBLAS 0.3.27 with 16 threads)
├☑ scipy             1.14.0
└☑ matplotlib        3.9.0 (backend=qtagg)

Numerical (optional)
├☑ sklearn           1.5.0
├☑ numba             0.60.0
├☑ nibabel           5.2.1
├☑ nilearn           0.10.4
├☑ dipy              1.9.0
├☑ openmeeg          2.5.11
├☑ pandas            2.2.2
├☑ h5io              0.2.3
├☑ h5py              3.11.0
└☐ unavailable       cupy

Visualization (optional)
├☑ pyvista           0.43.10 (OpenGL 4.6.0 Compatibility Profile Context 22.20.44.47.230323 via AMD Radeon(TM) Graphics)
├☑ pyvistaqt         0.11.1
├☑ vtk               9.3.0
├☑ qtpy              2.4.1 (PyQt5=5.15.8)
├☑ pyqtgraph         0.13.7
├☑ mne-qt-browser    0.6.3
├☑ ipywidgets        8.1.3
├☑ trame_client      3.2.0
├☑ trame_server      3.0.2
├☑ trame_vtk         2.8.9
├☑ trame_vuetify     2.6.0
└☐ unavailable       ipympl

Ecosystem (optional)
├☑ eeglabio          0.0.2-4
├☑ edfio             0.4.3
├☑ mffpy             0.9.0
├☑ pybv              0.7.5
└☐ unavailable       mne-bids, mne-nirs, mne-features, mne-connectivity, mne-icalabel, mne-bids-pipeline, neo

Hi,

It’s great to see people from all sorts of backgrounds eager to experiment and learn how to hack the brain! Good luck!

I am no expert in sleep studies, but I can try to give you some feedback.

  1. Have I explained my project enough for you to get an idea of what I want do to, or what other information would you like me to add?

I think your goal is clear, but maybe the means are not as much. I mean, you are referring to “patterns of different wake & sleep stages”, but what exactly are these patterns? Signal power I presume? There’s many metrics you can derive from your data, and it depends on you (and your hypothesis) to figure this out. What methods are usually described in articles on the topic? Try to start with something simple and then build upon it.

  1. Which steps of preprocessing should I include?

From a high-level point of view, band-pass filtering, re-referencing and bad channel removal sound ok (I am not a fan of aggressive pre-processing). Still, you should consider giving more details and/or code for people that have experience on similar analyses to be able to give you some feedback.
On top of this, consider if you want to use all your channels or some of them.

  1. In terms of ICA I gather there are 3 different algorithms (fastICA, Picard and Infomax?), so far I’ve just run it with default parameters, but I think I read somewhere that Picard might be preferable to fastICA when working with EEG? Is that correct? If so - in what way?

I am not an expert by no means, but start with default, gain some insight and then try other stuff. You can’t go wrong like this.

I think I can indentify artifacts from the eyes, but if I remove them, is that enough? Are all the other ICs also “disturbance” of different kinds, so I could “kill all the noise/artifacts” by removing all the ICs, or would that just remove the whole signal because the ICs represent all the components of the signal, brain activity included?

Yep, using all ICs will reconstruct your original signal and removing them all will leave you without any signal. You should remove only those that correspond to expected noise like eye movements, muscle artifacts, (maybe heartbeat ?). Tutorials on ICA like this one even show you how to use your non-scalp channels to automate the process.

  1. Should I epoch the data at some point during the preprocessing-process, ie to be able to reject epochs with super-high-voltage-spikes or periods with lots of movement in the beginning of the recording for example? And if so, should I… convert it back(?) to continuous before the analysis? And if so… how?

I think opinions vary on whether to pre-process continuous or epoched data, and you will get different results doing either. For your analysis, I think working with epochs would be easier, assuming you have the right annotations. I would suggest epochs like “awake”, “non-REM” and “REM” or “not Lucid” and “Lucid” (because I am not sure how you define periods of lucid dreams and how your data is organized). Then, extract your metrics for each kind of Epoch and compare across conditions.

Does it seem like a reasonable way to do this analysis all together or is there some big obvious flaw that just isn’t obvious to me based on my very minimal background knowledge in this kind of work? In that case, do you have any advice for me?

I think it’s absolutely fine. Still, while you are prototyping split it in steps, do one thing at a time and when you are confident it’s working move on to the next step.

Hope this helps a bit.

Cheers,

Hi Maria,

Part of your question ( 2. Which steps of preprocessing should I include?) resonates with me, as I have been grappling with the same issue and happened to come across your question while thinking it over. I specialize in sports psychology, and for over ten years, I have been conducting research under the supervision of a professor with advanced expertise in EEG within the field of sports science. Although I don’t have specific knowledge about sleep studies, I do handle very noisy EEG data collected during physical activities, so I am quite interested in preprocessing.

There is intriguing evidence suggesting that preprocessing may not always be necessary (EEG is better left alone | Scientific Reports). That said, I don’t think preprocessing can be completely ignored in all cases. It would be appropriate to adjust the preprocessing according to the characteristics of the collected data and the type of analysis to be performed.

I recommend starting with the default settings and then tuning the parameters and preprocessing steps by referring to prior studies with similar experimental designs to yours. After that, I believe it is important to experiment with different approaches, including the option of not applying unnecessary preprocessing steps, to identify the methods that make well-known phenomena or components in your data more prominent (i.e., achieving a high signal-to-noise ratio).