Facilitate access to open EEG/MEG databases, discussion

,

Hello everyone, I am Nitish Gupta a B.Tech. undergraduate from India. I was exploring the project when I came across the idea Facilitate access to open EEG/MEG databases.
I see that currently we support a quite a few datasets mainly hosted on osf.io.
The URL mapping of these datasets has been done inside utils.py, with their names, hashes, after which they are passed over to the downloader.
I have also been exploring some Open EEG data sources listed here, since every dataset I come across has a permanent download link, some ways I could think of were -

Manually finding and mapping more dataset download links the same way it is done currently inside mne/datasets/utils.py
Creating separate instances and grouping data to serve it in a new form to the users, for an instance data from brainstorm could be clubbed together and then served as a single API call to mne.datasets.brainstorm('resting') instead of bst_resting.data_path() which is used currently.
I was wondering what other possible approaches or ideas others have on this. Would love to hear some input on this from the community.

Hi @imnitishng,

There is work in progress to convert our dataset downloading to use the external library called pooch. This will change things on the back end (e.g., checksum hashes will be stored in a separate text file instead of in the code) but won’t involve any API changes like the one you suggest regarding brainstorm. In general we try to be conservative about changing the API, and the dataset downloading API is particularly tricky because changing it would involve lots of changes (for example, almost every tutorial in our documentation uses some sample dataset). That said, I would be happy to hear ideas about how we could improve the datasets API… there is some inconsistency among the data_path() and load_data() functions depending on if a dataset has multiple subjects that can be downloaded separately; that might be one place we could make the API more uniform.