Preprocessing EEG data for mental imagery offline analyses

  • MNE version: 1.1.0
  • operating system: Windows 10

Hi erverybody !

First of all I want to deeply thank everybody who contributes to this forum and the different tutorials available on the website :grin:
I started from scratch few days ago and, little by little, I get to have an overview of all the possibilities MNE gives us ! :star_struck:

Iā€™m working on some EEG datas recorded from one subject performing some hand motor imagery (IM) (wrist extension) and physical practice (PP) of the same mouvement.

My first objective is to visualize / compare the PSD for some band of frequency in different areas of the brain, but mainly over C3, CZ, C4 in the condition of PP vs the condition of IM.

To begin I load my raw data using a gdf file : 59 minutes recording (!), 32 channels EEG.

First problem : I probably have some kind of unit conversion problem somewhere between recording and data loading ! when I simply plot the raw data (see below), the signal amplitude is really low ! I have to zoom in before seeing a signal, or to re define the value of ā€œeegā€ in dict_plot (usually set on 20e-6, I have to set it to 20e-11 in order to have a signal without zooming).

Raw data visualization

For some reason I have 50 channels, but channels 33 to 50 show an horrible signal that doesnā€™t look like anything (see below).
So I added them to bad channels.

Bad channels visualization

My GDF file does not contain any information about EEG electrodes location, so I used a generic (built-in) standard montage using
mne.channels.make_standard_montage('biosemi32')
I renamed the channels name using raw.rename and was then able to set this standard montage to my raw data using raw.set_montage

These first steps allowed me to plot the PSD of the raw signal for each channels

PSD raw data

Then I filtered my signal using a low pass (40Hz) and high pass filter (1Hz)
Second question : I am well below the power-line frequency (50 Hz) so I donā€™t need any notch filter, right ?

Next step was to create Epochs object based on this raw file.
I donā€™t have a stim channels, but do have 3 types of annotations corresponding to : ā€œBeginning of Mental imageryā€, ā€œBeginning of physical practiceā€, ā€œstopā€.
Using mne.Events_from_annotations, I was able to ā€œextractā€ the event of interest, which gave me 162 events:

  • 12 tasks of PP,
  • 69 tasks of IM,
  • and 81 ā€œstopā€ annotations

From this events, I was able to ā€œepochā€ my data (5 seconds epochs, from -1s to +4s around the event of interest), and I used a rejection criteria to reject any epoch where peak-to-peak signal amplitude is beyond reasonable limit.
I saved the 69 epochs correspond to IM in a new objects ā€œmi_epochsā€ and the 12 epochs corresponding to PP in an object ā€œpp_epochsā€

To clean up my signal, Iā€™m kindoff confused on the next steps to operate.

I do not have EOG channel or ECG channels which make it difficult to remove these artifacts.
Is it always necessary to set a reference electrode ? If no such electrode had been set during recordings, should I use an average reference ?
Can I apply ICA to remove blink artifact using ica.find_bads_eog, using the channel Fp1 ?
Is there any other way to improve the signal / noise ratio in my case ?

Then, iā€™m going to get an estimate of evoked responses for the 2 conditions (MI and PP) by averaging together the epochs created using epochs.average()
And I will be able to plot some data from this epoch / evoked objects such as psd, psd topomap, evoked difference etc.
It might be a good start to compare both conditionsā€¦

But before that, iā€™m really not sure of which preprocessing steps I have to go through for a good preparation of my data ā€¦

Some help would be greatly appreciated :slight_smile:

1 Like

Maybe, but not necessarily. The scale at which the signals are visualized can be changed on the plot using the + and - keys. The real test is to compare the signal to the (pinkish-colored) scalebar at the top left of the window. (it can be easier to judge that when there are fewer channels displayed, which can be adjusted with PageUp and PageDown keys). I canā€™t read the scalebar label on your screenshot so I canā€™t really say if it seems OK or not.

Good.

OK. I trust that (1) your data actually came from a cap with that layout, and (2) you renamed the channels based on some knowledge of how their name / order in the file relates to the cap layout. (if not, how do you know which one corresponds to Cz, C3, C4,etc?)

yes.

Fine, though note that this is not strictly necessary; by providing an event_id dictionary when creating the epochs, it is possible for one object to store both kinds of events, and to recover them at will using, e.g., epochs['MI'] and epochs['PP'].

Removing them may or may not be necessary, depending on the magnitude of the artifacts and the particular research questions you have.

There is always a reference electrode during recording, it is the nature of how EEG works (measured signals are Voltages, i.e. differences in potential between each measurement electrode and the reference. Thus it is not always necessary to set a reference, as long as you know and are OK with the referencing scheme used during data acquisition. (if not, you can re-reference as you like)

it is certainly possible to substitute a frontal/forehead electrode in place of a proper EOG. It risks removing some brain-related signal in frontal areas, so if activity in that part of the brain is important, be careful.

Opinions vary widely as to what is best practice. Some prefer regression-based analyses, some prefer ICA, some prefer SSP. We have tutorials on all of these (as it seems youā€™ve discovered). General advice would be ā€œsee what is typical in published work in your area (motor imagery)ā€ and, if itā€™s heterogeneous, read the methods sections to see how each group justified their preprocessing choices, and pick the one that seems most applicable to your situation.

2 Likes

Hello @Geoffroy and welcome to the forum!

Considering @Geoffroy has to change the scaling by about 5 orders of magnitude (relative to our defaults) ā€¦ to me this seems to suggest that there might be a scaling issue upon import at well. So this is something we / they should look into again :slight_smile:

Thank you so much for your very quick and detailed answer !

Well I have to zoom in so much that the scalebar at the top left of the window indicates 0.0 mvā€¦
My signal is about 10^5 to 10^6 lower than usual EEG signal amplitude. Which made me think the signal had been ā€œconverted twiceā€ in ĀµV.
I think i have to come back to source and see what happens when the signal is being recorded.
Just wanted to mention it here in case someone had already faced this situation

OK. I trust that (1) your data actually came from a cap with that layout, and (2) you renamed the channels based on some knowledge of how their name / order in the file relates to the cap layout. (if not, how do you know which one corresponds to Cz, C3, C4,etc?)

Exactly, I didnā€™t detailed that in my previous post but data had been recorded with that layout (32 EEG scalp electrodes according to the international 10ā€“20 system), and I do know which Channel number corresponds to which electrode (Channel 1 in my raw data = Fp1 etc.)

There is always a reference electrode during recording, it is the nature of how EEG works (measured signals are Voltages, i.e. differences in potential between each measurement electrode and the reference. Thus it is not always necessary to set a reference, as long as you know and are OK with the referencing scheme used during data acquisition. (if not, you can re-reference as you like)

True :sweat_smile: ! And I actually think the Channel reference was my ā€œChannel 49ā€, since this signal appears to be totally flat when I plot it ā€¦ But I donā€™t know which electrode corresponds to this Channel (maybe a mastoid electrode has been set during recording ?) ā€¦
Again for this point, I have to go back to source, and need more information about recording conditions. Iā€™ll discuss that with the person concerned :grinning:
Thatā€™s something important to start with to be sure that this reference is coherent ā€¦

it is certainly possible to substitute a frontal/forehead electrode in place of a proper EOG. It risks removing some brain-related signal in frontal areas, so if activity in that part of the brain is important, be careful.

Iā€™ll try to do that, I should not be impacted too much by the lost of signal in frontal areas since I will focus on C3-C4-Cz. But Iā€™ll definitely keep that in mind if I have to do some data interpretation on more frontal areas.

Opinions vary widely as to what is best practice. Some prefer regression-based analyses, some prefer ICA, some prefer SSP. We have tutorials on all of these (as it seems youā€™ve discovered). General advice would be ā€œsee what is typical in published work in your area (motor imagery)ā€ and, if itā€™s heterogeneous, read the methods sections to see how each group justified their preprocessing choices, and pick the one that seems most applicable to your situation.

Thatā€™s exactly what I was thinking and what iā€™ve noticed in the few papers Iā€™ve read about this subject : it seems like there is no perfect / universal way to do it.
I guess I just wanted an easy answer to falicitate my work :sweat_smile: but Iā€™ll go through more litterature research and see what is most commonly used in this domain.

Again, thank you very much.
Iā€™ll continue my litterature review for the pre processing steps, and look for more information about recording conditions.
Iā€™ll probably be back with more questions very soon :laughing:

indeed this is suggestive of a data scaling problem, as @richard also suggested. Can you share the file? That way we can check if, for example, the file is storing data in Volts and weā€™re failing to detect it / assuming itā€™s in microVolts (or some similar bug). You could also try opening the file in another software (I donā€™t think EDF Browser handles GDF files, but if you have access to MATLAB you could try EEGLAB)

Iā€™m really sorry for the late answer !
This is one example of the file I have.
This raw data is a little bit more than 100 Mo

https://drive.google.com/file/d/1jrOnRC2AIRBX04-3a__3Qp4biH2Azjgp/view?usp=sharing

As you can see in the raw file, there are 3 types of annotation corresponding to
" Start mental imagery", ā€œStart physical practiceā€, Stop".
The majority of the data (ie : after a ā€œstopā€ and before a ā€œstartā€ annotation) corresponds to ā€œbreaksā€ and donā€™t really interest me right now.
I follow the tutorial ā€œmarking breaks and bad spansā€ to add some annotations on these breaks.
Eventhoā€™ I finally realized it was not necessary, I can just epoch the raw data around the ā€œstartā€ annotation to focus on the part of the signal I wantā€¦ Right ?

My next question is :
The data in the ā€œbreakā€ parts of the raw signal is much more noisy than the data in the segments of interest.
Should I apply my ICA to remove eye artifact on the raw data, or should I apply it after epoching ?
Or in other words : Should I apply ICA on the whole recording or just on the segments of interest ?

Iā€™ve seen that applying it on the full raw data might give more chance to ICA to find eye blink artifact.
But, in most of my raw file, data recorded in ā€œbreakā€ segments has a lot of mouvement artifacts. Whereas during the segments of interest, the participant was more ā€œfocusedā€ on the task to perform, and the data is more ā€œreadableā€. I feel like this big movement artifacts can be a problem when ICA tracks eye blink artifacts ā€¦

I hope Iā€™m making myself understandable :sweat_smile:

Hello,

yes, thatā€™s usually how you would handle event-related data.

This is to be expected, as during breaks, participants are usually allowed to move a little, swallow, etc.

Two thoughts pop up in my mind here:

  1. more data is better, but
  2. garbage in, garbage out.

If you do want to use the continuous data for training ICA, be sure to mark the break segments as ā€œbadā€ by assigning them an annotation that starts with the term bad_. This can even be automated to some extent via our annotate_break() function.

That said ā€“ I believe itā€™s easiest to simply try using ICA with the epoched data first. Not only will ICA converge in less time, but you can also be sure to only include task-relevant data in the process. Only if you believe that youā€™ve got too few epochs, or the ICA results are not satisfactory, you may reconsider the decision to use epoched data.

I only ever used ICA on epoched data and never had an issue with it. Itā€™s also the approach we take in the MNE-BIDS-Pipeline.

I hope this helps a bit!

Richard

Indeed, it did help !
Thank you Richard :slight_smile:

If you do want to use the continuous data for training ICA, be sure to mark the break segments as ā€œbadā€ by assigning them an annotation that starts with the term bad_ . This can even be automated to some extent via our annotate_break() function.

Thatā€™s finally how I did ! I apply ICA on the raw signal, after annotating the break segments with the term ā€œbad_ā€.
I also tried, just as a comparaison, to apply ICA on the whole raw signal, and I can confirm that the resultat was better when I annotated the bad segment !

For next time I might do that too :wink:

I come back my really first question about my voltage problem / probable unit error when charging the raw file : by any chance, did you have a look at the raw data i sent ?
I havenā€™t been able to talk to the person who recorded the signal yet

I havenā€™t had time, but have you tried opening the file with @cbrnrā€™s SigViewer?

I just did, and same problem.
Iā€™m around 0,002 ĀµV
There is definitely something wrong with my dataā€¦

1 Like

OK I had a look at the file.

channels 33-48 have strong sinusoidal interference. Channel 49 is flat. You can see the sinusoids quite clearly with raw.plot(scalings=dict(eeg=5e-8)). To get the other channels to look normal-ish I had to do raw.plot(scalings=dict(eeg=1e-10)) (default is 2e-5). scaling by 2e-11 looks plausible too, so it seems possible that the file is in Volts and MNE-Python thinks itā€™s in microVolts.

These (shell) commands show that there are some parts of the file that say ā€œuVā€

strings LYCA05_session6_part3-\[2019.06.21-16.27.33\].gdf | less
hexdump -C -n 5500 LYCA05_session6_part3-\[2019.06.21-16.27.33\].gdf

and some strategically-placed raise RuntimeErrors in the MNE-Python codebase helped me confirm that the read_raw_gdf() function is indeed recognizing that and setting edf_info['units'] = 1e-6 (in other words, it believes that the signals are in uV). I also confirmed that the helper function _read_segment_file() is indeed using those units to scale the signal values (see here and here). My best guess then is that the problem is in the data, not in MNE-Python; maybe somebody scaled the data and re-saved as GDF without updating the units (?)

1 Like

Thank you so much for taking the time to look to the file !

maybe somebody scaled the data and re-saved as GDF without updating the units (?)

The EEG was initially performed for neurofeedback of a mental imagery task.
Now Iā€™m working of the file that had been recorded for offline analysis.
I meet this tuesday with the person who did the MI session and who recorded the data, maybe he has the answer !

I have another question regarding the pre-processing steps, and more specifically postural maintenance muscle artifacts.
They are usually found in high frequency am I right ?
Therefore I was wondering if applying a low pass filter (iā€™m gonna foccus on alpha and beta waves so I will not really need frequencies above 30Hz) can be enough to get rid of them ?

Please create a new topic for new questions, thank you!

Have a look at mne.preprocessing.annotate_muscle

It will filter for you, detect artifacts, then annotate the unfiltered file. You can also search the forum, there may be other posts that ask/answer about muscle artifacts