read_raw_ctf().load_data() overflow error

Hi everyone!

I’m working with MEG ctf data and I’m having trouble loading it.

Here’s an example:

file_path = r'C:/Users/joaco/OneDrive - The University of Nottingham/MEGEYEHS/DATA/CTF_DATA/15909001/15909001_MatiasIson_20220530_02.ds'
raw = mne.io.read_raw_ctf(file_path)

OUTPUT:
ds directory : C:\Users\joaco\OneDrive - The University of Nottingham\MEGEYEHS\DATA\CTF_DATA\15909001\15909001_MatiasIson_20220530_02.ds
    res4 data read.
    hc data read.
    Separate EEG position data file read.
    Quaternion matching (desired vs. transformed):
       0.42   68.31    0.00 mm <->    0.42   68.31    0.00 mm (orig :  -53.00   45.90 -248.52 mm) diff =    0.000 mm
      -0.42  -68.31    0.00 mm <->   -0.42  -68.31    0.00 mm (orig :   45.10  -49.18 -246.57 mm) diff =    0.000 mm
      83.73    0.00    0.00 mm <->   83.73   -0.00    0.00 mm (orig :   53.82   58.80 -242.95 mm) diff =    0.000 mm
    Coordinate transformations established.
    Polhemus data for 3 HPI coils added
    Device coordinate locations for 3 HPI coils added
    2 extra points added to Polhemus data.
    Measurement info composed.
Finding samples for C:\Users\joaco\OneDrive - The University of Nottingham\MEGEYEHS\DATA\CTF_DATA\15909001\15909001_MatiasIson_20220530_02.ds\15909001_MatiasIson_20220530_02.meg4: 
    System clock channel is available, checking which samples are valid.
C:\Users\joaco\anaconda3\envs\MEGEYEHS_Python\lib\site-packages\mne\io\ctf\ctf.py:230: RuntimeWarning: overflow encountered in long_scalars
  offset = CTF.HEADER_SIZE + (samp_offset * res4['nchan'] +
Traceback (most recent call last):
  File "C:\Users\joaco\anaconda3\envs\MEGEYEHS_Python\lib\site-packages\IPython\core\interactiveshell.py", line 3398, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-20-208af92d7217>", line 1, in <cell line: 1>
    raw = mne.io.read_raw_ctf(file_path)
  File "C:\Users\joaco\anaconda3\envs\MEGEYEHS_Python\lib\site-packages\mne\io\ctf\ctf.py", line 66, in read_raw_ctf
    return RawCTF(directory, system_clock, preload=preload,
  File "<decorator-gen-223>", line 12, in __init__
  File "C:\Users\joaco\anaconda3\envs\MEGEYEHS_Python\lib\site-packages\mne\io\ctf\ctf.py", line 131, in __init__
    sample_info = _get_sample_info(meg4_name, res4, system_clock)
  File "C:\Users\joaco\anaconda3\envs\MEGEYEHS_Python\lib\site-packages\mne\io\ctf\ctf.py", line 232, in _get_sample_info
    fid.seek(offset, 0)
OSError: [Errno 22] Invalid argument

It encounters a long scalar overflow. I could get past this with:

raw = mne.io.read_raw_ctf(file_path, system_clock='ignore')

OUTPUT:
ds directory : C:\Users\joaco\OneDrive - The University of Nottingham\MEGEYEHS\DATA\CTF_DATA\15909001\15909001_MatiasIson_20220530_02.ds
    res4 data read.
    hc data read.
    Separate EEG position data file read.
    Quaternion matching (desired vs. transformed):
       0.42   68.31    0.00 mm <->    0.42   68.31    0.00 mm (orig :  -53.00   45.90 -248.52 mm) diff =    0.000 mm
      -0.42  -68.31    0.00 mm <->   -0.42  -68.31    0.00 mm (orig :   45.10  -49.18 -246.57 mm) diff =    0.000 mm
      83.73    0.00    0.00 mm <->   83.73   -0.00    0.00 mm (orig :   53.82   58.80 -242.95 mm) diff =    0.000 mm
    Coordinate transformations established.
    Polhemus data for 3 HPI coils added
    Device coordinate locations for 3 HPI coils added
    2 extra points added to Polhemus data.
    Measurement info composed.
Finding samples for C:\Users\joaco\OneDrive - The University of Nottingham\MEGEYEHS\DATA\CTF_DATA\15909001\15909001_MatiasIson_20220530_02.ds\15909001_MatiasIson_20220530_02.meg4: 
    System clock channel is available, but ignored.
    290 x 6000 = 1740000 samples from 422 chs
Current compensation grade : 3

At this point I can plot the data with raw.plot() but I cannot apply any filters or so.

When I run raw.load_data()

raw.load_data()

OUTPUT:
ds directory : C:\Users\joaco\OneDrive - The University of Nottingham\MEGEYEHS\DATA\CTF_DATA\15909001\15909001_MatiasIson_20220530_02.ds
    res4 data read.
    hc data read.
    Separate EEG position data file read.
    Quaternion matching (desired vs. transformed):
       0.42   68.31    0.00 mm <->    0.42   68.31    0.00 mm (orig :  -53.00   45.90 -248.52 mm) diff =    0.000 mm
      -0.42  -68.31    0.00 mm <->   -0.42  -68.31    0.00 mm (orig :   45.10  -49.18 -246.57 mm) diff =    0.000 mm
      83.73    0.00    0.00 mm <->   83.73   -0.00    0.00 mm (orig :   53.82   58.80 -242.95 mm) diff =    0.000 mm
    Coordinate transformations established.
    Polhemus data for 3 HPI coils added
    Device coordinate locations for 3 HPI coils added
    2 extra points added to Polhemus data.
    Measurement info composed.
Finding samples for C:\Users\joaco\OneDrive - The University of Nottingham\MEGEYEHS\DATA\CTF_DATA\15909001\15909001_MatiasIson_20220530_02.ds\15909001_MatiasIson_20220530_02.meg4: 
    System clock channel is available, but ignored.
    290 x 6000 = 1740000 samples from 422 chs
Current compensation grade : 3
Reading 0 ... 1739999  =      0.000 ...  1449.999 secs...
C:\Users\joaco\anaconda3\envs\MEGEYEHS_Python\lib\site-packages\mne\io\ctf\ctf.py:176: RuntimeWarning: overflow encountered in long_scalars
  pos += samp_offset * si['n_chan'] * 4
Traceback (most recent call last):
  File "C:\Users\joaco\anaconda3\envs\MEGEYEHS_Python\lib\site-packages\IPython\core\interactiveshell.py", line 3398, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-23-72a37d28be67>", line 1, in <cell line: 1>
    raw.load_data()
  File "<decorator-gen-206>", line 12, in load_data
  File "C:\Users\joaco\anaconda3\envs\MEGEYEHS_Python\lib\site-packages\mne\io\base.py", line 557, in load_data
    self._preload_data(True)
  File "C:\Users\joaco\anaconda3\envs\MEGEYEHS_Python\lib\site-packages\mne\io\base.py", line 567, in _preload_data
    self._data = self._read_segment(
  File "<decorator-gen-205>", line 12, in _read_segment
  File "C:\Users\joaco\anaconda3\envs\MEGEYEHS_Python\lib\site-packages\mne\io\base.py", line 452, in _read_segment
    _ReadSegmentFileProtector(self)._read_segment_file(
  File "C:\Users\joaco\anaconda3\envs\MEGEYEHS_Python\lib\site-packages\mne\io\base.py", line 2158, in _read_segment_file
    return self.__raw.__class__._read_segment_file(
  File "C:\Users\joaco\anaconda3\envs\MEGEYEHS_Python\lib\site-packages\mne\io\ctf\ctf.py", line 177, in _read_segment_file
    fid.seek(pos, 0)
OSError: [Errno 22] Invalid argument

I get a similar error. This also happens if I run the read_raw_ctf(preload=True). So I’m guessing the issue is when actually loading the data.

I tried tis in two different computers, and with 3 different datasets (one of them from an other experiment) and the error persisted.
I also tried to load the data in Matlab and it worked… So I’m not sure what the problem could be.

Here’s my specs:

mne.sys_info()

OUTPUT:
Platform:         Windows-10-10.0.19044-SP0
Python:           3.9.12 (main, Apr  4 2022, 05:22:27) [MSC v.1916 64 bit (AMD64)]
Executable:       C:\Users\joaco\anaconda3\envs\MEGEYEHS_Python\python.exe
CPU:              AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD: 16 cores
Memory:           15.4 GB
mne:              1.0.3
numpy:            1.21.5 {blas=mkl_rt, lapack=mkl_rt}
scipy:            1.7.3
matplotlib:       3.5.1 {backend=QtAgg}
sklearn:          1.0.2
numba:            0.55.1
nibabel:          3.2.2
nilearn:          0.9.1
dipy:             1.5.0
cupy:             Not found
pandas:           1.4.2
pyvista:          0.34.0 {OpenGL 4.5.13596 Core Profile Context 20.10.20.14 27.20.11020.14001 via AMD Radeon(TM) Graphics}
pyvistaqt:        0.9.0
ipyvtklink:       0.2.2
vtk:              9.0.3
PyQt5:            5.12.3
ipympl:           Not found
pooch:            v1.6.0
mne_bids:         Not found
mne_nirs:         Not found
mne_features:     Not found
mne_qt_browser:   0.3.1
mne_connectivity: Not found

Thanks for your help!

@Joac - we have a CTF scanner and are able to read in the data (although I haven’t tried it on a windows machine).

One thing is that your data says that it is 1740000 * 422 chans - the 24 minutes of data might be too much for the amount of ram that you have - 16Gb. (Thats just a guess though).

You can try doing the following to see if this is the case:

raw = mne.io.read_raw_ctf(fname, system_clock='ignore')
raw.crop(0,10) #Just select the first 10 seconds 
raw.load_data()

You can also check you system monitor and see what your Ram looks like during load time of the full dataset. Anyways - just a guess, but worth a try.

-Jeff

1 Like

Hi, Thanks for your answer!

I checked the ram memory while loading and it was getting pretty high. I tried cropping the data and I could load it.

But now I’m trying with the other pc, with the following specs:

mne.sys_info()

OUT:
Platform:         Windows-10-10.0.19042-SP0
Python:           3.10.4 (tags/v3.10.4:9d38120, Mar 23 2022, 23:13:41) [MSC v.1929 64 bit (AMD64)]
Executable:       C:\Users\lpajg1\OneDrive - The University of Nottingham\MEGEYEHS\MEGEYEHS_Python\venv\Scripts\python.exe
CPU:              Intel64 Family 6 Model 141 Stepping 1, GenuineIntel: 16 cores
Memory:           31.2 GB
mne:              1.0.3
numpy:            1.22.4 {}
scipy:            1.8.1
matplotlib:       3.5.2 {backend=TkAgg}
sklearn:          Not found
numba:            Not found
nibabel:          Not found
nilearn:          Not found
dipy:             Not found
cupy:             Not found
pandas:           1.4.2
pyvista:          Not found
pyvistaqt:        Not found
ipyvtklink:       Not found
vtk:              Not found
PyQt5:            Not found
ipympl:           Not found
pooch:            v1.6.0
mne_bids:         Not found
mne_nirs:         Not found
mne_features:     Not found
mne_qt_browser:   Not found
mne_connectivity: Not found

On this PC, the ram gets to ~61% while loading, and it crashes with the same error.

raw.load_data()

OUT:
Reading 0 ... 1739999  =      0.000 ...  1449.999 secs...
Traceback (most recent call last):
  File "C:\Users\lpajg1\AppData\Local\Programs\Python\Python310\lib\code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "<decorator-gen-206>", line 12, in load_data
  File "C:\Users\lpajg1\OneDrive - The University of Nottingham\MEGEYEHS\MEGEYEHS_Python\venv\lib\site-packages\mne\io\base.py", line 557, in load_data
    self._preload_data(True)
  File "C:\Users\lpajg1\OneDrive - The University of Nottingham\MEGEYEHS\MEGEYEHS_Python\venv\lib\site-packages\mne\io\base.py", line 567, in _preload_data
    self._data = self._read_segment(
  File "<decorator-gen-205>", line 12, in _read_segment
  File "C:\Users\lpajg1\OneDrive - The University of Nottingham\MEGEYEHS\MEGEYEHS_Python\venv\lib\site-packages\mne\io\base.py", line 452, in _read_segment
    _ReadSegmentFileProtector(self)._read_segment_file(
  File "C:\Users\lpajg1\OneDrive - The University of Nottingham\MEGEYEHS\MEGEYEHS_Python\venv\lib\site-packages\mne\io\base.py", line 2158, in _read_segment_file
    return self.__raw.__class__._read_segment_file(
  File "C:\Users\lpajg1\OneDrive - The University of Nottingham\MEGEYEHS\MEGEYEHS_Python\venv\lib\site-packages\mne\io\ctf\ctf.py", line 177, in _read_segment_file
    fid.seek(pos, 0)
OSError: [Errno 22] Invalid argument

32GB should be enough to load this data right?
Is there a way to downsample the data prior to loading it? Or could it be loading smaller chuncks of the data, downsamplig those and concatenating to a new downsampled raw data?
I know it’s not ideal but still worth a try.

Thanks!

That is not enough. If my math is right, you need 43.7 Gb of RAM to load this one.

1 Like

@jstout211 @mscheltienne -

If my math is right, you need 43.7 Gb of RAM to load this one.

Rigth.

So are MEG files usually much smaller? Or you can work with them without loading into memory? (apply filters, select channels, etc.)
How does one usually manage ~20 min long MEG data?(apart from Matlab that seems to load them just fine).

Do you think that downsampling them before loading could be an option?

Sorry, I’m new to MEG data. Thanks again!

@larsoner Why is this crashing and not simply starting to swap to disk? By default, Windows uses swapping. – My Mac is swapping all the time during my ordinary workday. Even the next version of iPadOS will have swapping support. There must be something off here if things simply start crashing on @Joac’s computer?

Apparently your sampling rate is 6000 Hz (or am I mistaken here?) I’ve never seen one this high on an MEG recording, honestly. Typically 1, max 2 kHz.

Also the blocks/runs I’ve dealt with in the past are usually much shorter (about 8 mins each, i.e. only a third the duration of what you have here)

So we’re talking about a factor of 3*6 = 18 i.e. the data you’re trying to process here seems to be almost 20 times the size of what I – in my, granted, limited experience – have so far dealt with and seen in the wild. But this is exciting :slight_smile: It shouldn’t just crash! If anything, it should swap your computer to death :wink: :skull_and_crossbones:

I would preload with memmapping, e.g., with read_raw_ctf(..., preload=r'path\to\some\filename\that\will\become\50GB'). Then you could downsample for example to 1000 Hz, or just live with the processing being slow when you do stuff like raw.filter.

AFAIK all OSes have some swap space limit. According to StackOverflow on macOS it’s likely between 64 and 128 GB:

In any case, eventually you’ll be limited by your hard drive space. But hopefully you never come close to this…

1 Like

Why would that be necessary if you have enough virtual memory (which I assume OP does have)? And why isn’t the error message they’re getting something like, “killed because out of memory”?

@richard

Apparently your sampling rate is 6000 Hz

The sampling rate is actually 1200Hz

@larsoner

I would preload with memmapping, e.g., with read_raw_ctf(..., preload=r'path\to\some\filename\that\will\become\50GB')

Ok! so I did that:

raw = mne.io.read_raw_ctf(file_path, system_clock='ignore', preload=str(subj_path) + r'\tmp')

OUT:
ds directory : C:\Users\lpajg1\OneDrive - The University of Nottingham\MEGEYEHS\DATA\CTF_DATA\15909001\15909001_MatiasIson_20220530_02.ds
    res4 data read.
    hc data read.
    Separate EEG position data file read.
    Quaternion matching (desired vs. transformed):
       0.42   68.31    0.00 mm <->    0.42   68.31    0.00 mm (orig :  -53.00   45.90 -248.52 mm) diff =    0.000 mm
      -0.42  -68.31    0.00 mm <->   -0.42  -68.31   -0.00 mm (orig :   45.10  -49.18 -246.57 mm) diff =    0.000 mm
      83.73    0.00    0.00 mm <->   83.73   -0.00    0.00 mm (orig :   53.82   58.80 -242.95 mm) diff =    0.000 mm
    Coordinate transformations established.
    Polhemus data for 3 HPI coils added
    Device coordinate locations for 3 HPI coils added
    2 extra points added to Polhemus data.
    Measurement info composed.
Finding samples for C:\Users\lpajg1\OneDrive - The University of Nottingham\MEGEYEHS\DATA\CTF_DATA\15909001\15909001_MatiasIson_20220530_02.ds\15909001_MatiasIson_20220530_02.meg4: 
    System clock channel is available, but ignored.
    290 x 6000 = 1740000 samples from 422 chs
Current compensation grade : 3
Reading 0 ... 1739999  =      0.000 ...  1449.999 secs...
Traceback (most recent call last):
  File "C:\Users\lpajg1\AppData\Local\Programs\Python\Python310\lib\code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "C:\Users\lpajg1\OneDrive - The University of Nottingham\MEGEYEHS\MEGEYEHS_Python\venv\lib\site-packages\mne\io\ctf\ctf.py", line 66, in read_raw_ctf
    return RawCTF(directory, system_clock, preload=preload,
  File "<decorator-gen-223>", line 12, in __init__
  File "C:\Users\lpajg1\OneDrive - The University of Nottingham\MEGEYEHS\MEGEYEHS_Python\venv\lib\site-packages\mne\io\ctf\ctf.py", line 142, in __init__
    super(RawCTF, self).__init__(
  File "<decorator-gen-203>", line 12, in __init__
  File "C:\Users\lpajg1\OneDrive - The University of Nottingham\MEGEYEHS\MEGEYEHS_Python\venv\lib\site-packages\mne\io\base.py", line 287, in __init__
    self._preload_data(preload)
  File "C:\Users\lpajg1\OneDrive - The University of Nottingham\MEGEYEHS\MEGEYEHS_Python\venv\lib\site-packages\mne\io\base.py", line 567, in _preload_data
    self._data = self._read_segment(
  File "<decorator-gen-205>", line 12, in _read_segment
  File "C:\Users\lpajg1\OneDrive - The University of Nottingham\MEGEYEHS\MEGEYEHS_Python\venv\lib\site-packages\mne\io\base.py", line 452, in _read_segment
    _ReadSegmentFileProtector(self)._read_segment_file(
  File "C:\Users\lpajg1\OneDrive - The University of Nottingham\MEGEYEHS\MEGEYEHS_Python\venv\lib\site-packages\mne\io\base.py", line 2158, in _read_segment_file
    return self.__raw.__class__._read_segment_file(
  File "C:\Users\lpajg1\OneDrive - The University of Nottingham\MEGEYEHS\MEGEYEHS_Python\venv\lib\site-packages\mne\io\ctf\ctf.py", line 177, in _read_segment_file
    fid.seek(pos, 0)
OSError: [Errno 22] Invalid argument

I get a different error now, and I can see the ~5 GB tmp file. Is there any specific format for that file?
Also, I’m running this from the 32GB ram PC, and I can see the Ram usage never getting above 65% while it’s loading…

They probably don’t. Last I knew, Windows defaulted to something around the same size as your actual physical memory. @joac you could check to see what you have it set to using something like

Increasing the size might allow you to work as if this were all in memory. But it really seems like a bit of a hack because if we’re off by a factor of 2 (e.g., float32 vs float64 in our hypothetical calculations) or you start making Epochs etc. you’re going to hit the problem again :frowning:

I have a bad feeling that this has to do with fid.seek wanting a 32-bit pointer because of Windows DWORD junk (even on 64-bit OS + 64-bit Python, which I think you’re running, right?). I might need to try to replicate locally and fix it. Can you share the file somewhere? If not, I can try to reproduce with a RawArray(...).save(...) then load, but I’m not 100% sure it will show up with FIF because we might use different loading mechanics there…

Their sys_info output said AMD64 for the Python binary, so yeah, seems to be 64 bits.

@larsoner

you could check to see what you have it set to using something like

I can send you that once I get to my personal pc, because I don’t have admin privileges to check that on this one…

Can you share the file somewhere?

Here’s a link to the .ds:
https://drive.google.com/drive/folders/14wE-jQ2T06HHzyDYhUgP53DIiFu6riUw?usp=sharing

Let me know if that works! Or if you need me to share it in a different way.

Thanks!

1 Like

@larsoner

They probably don’t. Last I knew, Windows defaulted to something around the same size as your actual physical memory. @joac you could check to see what you have it set to using something like

I checked that, here it is:
WhatsApp Image 2022-06-09 at 11.32.49 AM

This is quite small, I’m surprised it’s that small. You can try to increase this significantly — but only if your hard drive is very fast (a modern SSD), otherwise it might be a bad idea.

I think it’s set to 8GB because my pc has 8GB of soldered ram, and an other 8GB on a ram slot. I could increase that because I have a fast SSD.

However, this is the virtual memory from the other computer I couldn’t access yesterday:

image

This is way bigger and it threw the same error, so I don’t think that would help.

@richard I think windows has a very small swap memory by default, especially compared to macOS or Linux. For instance, I have a 2-year-old windows PC with 32 Gb of RAM… and only 4 864 Mb of swap set by default. I don’t really know how windows determines this default value.

1 Like

You could try to run your script via a memory profiler to get a better impression of how much RAM is being consumed over time.

To do that, install memory-profiler:

pip install -U memory-profiler

Then run your script through the profiler; it will try to keep track of the consumed memory:

mprof run your_analysis_script.py

Once you’ve done this (and your script has potentially crashed), you can visualize the results:

mprof plot

@richard

Here’s the mprof plot and also the task manager performance tab at that time. I was tracking it and it got to 80% when it crashed.


1 Like

Super weird. What did the swap usage look like?

You can usually get more comprehensive information from the Resource Monitor in Windows