EEG preprocessing and analysis

  • MNE-Python version: 3.6.9
  • operating system: Windows

My name is Ilda Alushaj and I am a master’s student at MIPT (Moscow Institute of Physics and Technology) . My Scientific Research Topic is : “Development of methods for predicting speech disorders during brain surgery” . Saying it shortly, I have 19 edf file format of EEG data(respectively with 8 and 16 channels), and I need to do preprocessing. feature extraction and classification of these patients which are diagnosed with brain tumor, and predict what will happen with the speech after operation.

I did some EEG preprocessing but I am stuck at feature extraction and classification. I don’t understand how it works and how can I implement it in my code.
I saw this forume helped a lot of people.

I thought maybe I can have some help from you in my work, if it is possible.

I am ready to share my notebook with what I have done until now if it is necessary .

Thank you!

From my experience, first of all, you need to know what kind of features you need to extract, then you will have the feature information, maybe a NumPy array. Then you will use some traditional machine learning methods to do the classification, like SVM, KNN, or the simplest method, logisticr regression.

As to the feature part, if you don’t know what feature you need, you should read papers associated to your work, find out what kind of features people use in this task. But to EEG, CSP algorithm is very classic to extract features.
Or, if your data is big enough, you can try deep learning, which contains automatic feature extraction and classification. Maybe this will help GitHub - braindecode/braindecode: Deep learning software to decode EEG or MEG signals
As I read your questions again, I thought you may have not learned machine learning or deep learning, so I suggest you to learn machine learng first, then things you need to do is going to be clear.

Thank you so much! In fact, I have tried to do these steps First, I did preprocessing using high pass filtering and to extract features I used DWT(Discrete Wavelet Transform) - a time - frequency domain and also the mean for all my edf files. After that I saw my features were still a lot , so I used L2 norm and took one single value instead of a matrix. After that I did vectorization… I am just not sure if I have done the right steps or not. I want to try to understand what I am doing and why
Here is the link to my notebook : https://colab.research.google.com/drive/1uTb55nuB8Hp96Zdz2Auo2q3fsZjS7LoN?usp=sharing
Can you tell me please, from your experience, do you think these are right steps?

I haven’t had finished reading your protocol, but I have one suggestion. If you have the data from of the same disease but from different people, you could use a list or a dictionary to store it. For a list, you can use like

data = list()
raw = mne.io.read_epochs(fpath)
data.append(raw)

or a dictionary, so the key can be the number it the name of a patient. This will make your code clean, you don’t need to do like

data1 = ...
data2 = ...

In this way, if you do the same process to these data, you will just need a loop to implement it.
And for a function, the parameter’s name just for easy understanding, so you don’t need to rewrite a function like this for each variable:
image
iIf I’ve got a list named eeg_data containing the data from every patient. I would do this:

import pywt
result = list()
for data in eeg_data:
   result.append(pywt.dwt(data , 'db4'))

And also, if you do np.mean, it will be clear to set the axis parameter. As I guess, you may want to average the data across the time?

And I don’t know why doing norm, but if you follow a paper associated, or your teacher told you to do this, then is fine. At last, you used logistic regression to do the classification. This part need you to learn technically for it’s a little complex.

Well I want to do the logistic regression but if you see carefully, I dont know how to declare x( I mean, I don’t know what values and how to create a vector to use all my values from L2 norm. I used the norm because I saw that from DWT, I got a lot of values, from each edf file, so I decided to use that to take one single value. I just don’t know how to implement it now, to try logistic regression

You should find a paper related and follow its protocol to make sure thet every step makes sense.

Thank you so much for contributing.