Best way to Store EEG. Pandas, sqlite or other option?

Dear Colleagues,

After cleaning my EEGs I’d like to append them to a sort of databank or list that I could call for plotting or statistics with pandas and numpy.
I dont want sthing huge and specific like SQL, ORACLE , etc (time consuming). I’ve tryed sqlite, but I’m still confused about the best option. Is it possible to use sqlite? If I append Pandas in a list,what happens to Can I append in a list or databank mne.numpy? The main question is which one is the smothest way to do it? I couldn’t find any related tutorial on google .
Sorry if the question is the dummy, but my background is neurology. Thanks in advance.

  • MNE version: e.g. 0.24.0
  • operating system: e.g. Windows 10

can you tell us more about why you want data in a database (or perhaps equivalently, why loading/saving in FIF format won’t work)?

Ok. For instance, I have some rest awake EEG CDR, MMSE and psychological evaluation of patients with Alzheimer’s disease (AD).

( Kanda, Paulo Afonso Medeiros, et al. “Clinician’s road map to wavelet EEG as an Alzheimer’s disease biomarker.” Clinical EEG and neuroscience 45.2 (2014): 104-112.)

With WEKA and support vector machine We had an accuracy of 92.72% separating AD from normals. Unfortunately I used a very laborious and specific technique to do it. I want to bring it to clinicians. For example, if I have a databank of EEGs from AD the clinician could (clicking some tkinter buttons) compare the EEG of 1 patient (with possible memory loss) with some parameters of the databank (associated with the clinical evaluation) and estimate the risk to dementia. By the way, resting EEG is powerful in the evaluation of such patients. In case it works, I could try it in vascular diseases, attention deficit disorder and so on. In such cases it would not be feasible if each time I’m evaluating a new person I had to call, lets say, 250 EEG.fif records, append all, and re apply the Support Vector Machine algorythm to compare to the proband EEG record. Besides as “n” in databank increases (normals, children with learning disabilities, temporal lobe epilepsy databanks) parametric, nonparametric statistics become stronger. To sum, I need strong consolidated result of means of homogeneos and specific groups of patients, for comparison with 1 new individual person in risk of an event. Thanks for any hint.

It sounds like you want some sort of trained model / classifier that you could easily apply to new data, through a GUI. If that’s right, then once the model is trained, do you even need the training data available anymore (e.g. in a database) for the GUI to work? In other words, when you say

I wonder why you need to re-load the training data for each new patient, if the model has already been trained?

Of course you still need to get the training data into the correct form to create the model in the first place. For that we have a few examples in our docs (two tutorials and twelve examples) most of which will have some variation on the following:['Left'].get_data(),
          y=epochs['Left'].events[:, 2] > 2)

in other words, training data is a NumPy array, in this case acquired using epochs.get_data(), and (for supervised learning) the labels are provided by the epochs events array (though one could just as easily use epochs metadata).

Hopefully this helps… if not, maybe @agramfort is more familiar with the kinds of things you might want to do here, and can offer better suggestions. If I’ve missed the point, please try again to rephrase what it is you want to be able to do @PauloKanda.

1 Like

I will study the tutorials you suggested and see if I can work with them. Thank you again.

" I’d like to append them to a sort of databank or list that I could call for plotting or statistics with pandas and numpy."
I had a similar requirement wherein I had to read through individual subjects’ fdt data, do the epoching and apply feature extraction / classification. So as I read through each subject’s .FDT (eeglab) file and did the epoch, I converted them into pandas dataframe using to_data_frame, concatenated each subject’s data using pandas.concat and saved them as .csv (added a ‘subject’ column in addition to the pre-generated columns). Now I can visualize individual subject’s data / all subjects’ data and train models on this dataset.

" If I append Pandas in a list,what happens to " - I didn’t need the file once the dataset is created, but if you will still need it, you can save it as a separate text file along with the csv (either for one subject or for all subjects).

And as for your requirements, you could extend this approach by converting the incoming patient’s data to a dataframe, and either a) append it to the existing dataset, or b) validate the patient’s data using the pre-trained model .

‘Besides as “n” in databank increases (normals, children with learning disabilities, temporal lobe epilepsy databanks) parametric, nonparametric statistics become stronger.’ - Similar to adding patient’s data the existing dataset (rows), these additional criteria / conditions can be appended as columns for the data for which the criterion is known and fill it with Nan otherwise.

Something like this:

fnames = glob.glob("*.set")
for fname_count,fname in enumerate(fnames):
    raw =, preload=False)
    raw_events, raw_event_id = mne.events_from_annotations(raw)
    raw_epochs = mne.Epochs(raw, events=raw_events, event_id=raw_event_id,  tmin = t_min, tmax = t_max, baseline = (-0.4, 0), preload=False) 
    df = raw_epochs.to_data_frame()
    df['subject'] = fname_count+1 
if fname_count == 0:
    df_all = df
    df_all = pd.concat([df_all , df])    

1 Like

Dear Mr Kumar

I was wondering about how to do something like that. Accostumed to work with excel I needed something more tangible like you suggest. I really appreciate your help with this subject. Thank you very much.