Dealing with NaNs and missing data in pupil and eyetracking data

Hello dear Forum,

this has been probably a bit too long coming therefore apologies for this post being so long.

I saw in an 2023 post by @scott-huberty that the issue of NaNs in eyetracking data had been raised, but couldn’t find any further discussions or solutions on how to deal with it. I have been working with MNE on binocular pupil data for nearly two years now (originally got introduced to it by @sappelhoff and also met some other developers at a workshop in 2023). I have since ran into a few issues, mostly related to dealing with NaNs, that I kept fixing on my own but now it feels like I opened pandoras box and would appreciate someone else having a look at what I did and discuss my approach.

Info:

MNE version: 1.9.0

Operating system (on remote server connected to via ssh): Ubuntu 20.04.6

Annotating blinks - the origin story:

Basically I had originally preprocessed all my data using MNE’s functions but realised that with any zero-datapoints automatically annotated as “BAD_blinks”, interpolating over these periods leads to unlikely data values for long stretches of time in some instances, potentially distorting any following analyses. I therefore first opted for a more complex algorithm to identify and annotate blinks, not by the mere size of the pupil, but by identifying periods in the pupil traces that were defined by a high velocity pupil constriction, necessarily followed by a high velocity pupil dilation. I also time capped these identified blinks to 0.5 seconds to include only eye movements with a typical blink trajectory. This is an approach I took from Mathôt (2013) and Mathôt and Vilotijevic (2022) and adapted for MNE. I additionally merged and renamed all blinks that were 0.1s or less apart to avoid artefacts due to too few datapoints to use for interpolation. Since I wanted to retain the annotations for blinks used for merged blinks though, I only added and renamed existing annotations. After identifying all blinks, I set all periods where pupil size was zero to NaN. I then tried to interpolate over the annotated blinks.

This is where I ran into the first issue which appears to be a bug:

I tried to interpolate the data using two descriptions within match and the interpolation still only worked for “BAD_blinks” and not my newly created annotations.

The bug seems to be that the helper function _interpolate_blinks already predefines the description within the definition of starts and ends :

starts, ends = _annotations_starts_stops(raw, "BAD_blink")

changing that to

starts, ends = _annotations_starts_stops(raw, match)

fixed the issue easily.

While at it a colleague and I also included additional methods for other interpolations (including monotonic cubic spline) while ignoring NaNs.

I then used mne.preprocessing.annotate_nan(raw) to annotate all remaining NaN periods.

I wanted to retain these NaN periods since zero seemed to also introduce distortions into my data. In most cases where data was lost for other reasons than blinks, it was lost only in one eye and often for technical issues. Participants falling asleep also occurred, here both pupil channels would then be NaN using the nanmean which seems appropriate to me. I was therefore keen on using the nanmean to retain pupil traces where possible from the remaining pupil. I was also able to look at statistics on missing data and number of blinks with my custom annotations which I also found useful.

This so far was and is a well working pipeline for me, identifying many blinks that weren’t identified with the original blink annotation and getting rid of arbitrary distortions due to long interpolations or zero data. If useful I am happy to share my code for this.

I have since had to realise though that most functions within MNE are not equipped to handling data with NaNs.

These are my current work arounds that are rather straight forward:

(These do lead to warnings which are to be expected though: :26: RuntimeWarning: Mean of empty slice”)

# combining pupil channels using nanmean
combine_nanmean = lambda data: np.nanmean(data, axis=1)
pupil_epochs_combined = mne.channels.combine_channels(pupil_epoch, roi_dict, method=combine_nanmean)

# averaging epochs using nanmean    
evoked_nan_mean = lambda x: np.nanmean(x, axis=0)
epochs_evoked = pupil_epochs_combined.average(method=evoked_nan_mean)

But now I have also realised that the baseline correction I am running on my data is probably corrupted as well since it doesn’t allow for a method to be specified in my knowledge I have yet to find a workaround for that and thought it might be time to raise this issue and share my solutions up to this point.

My questions then are:

  1. Has there been any further discussion of how to handle NaNs in eyetracking or pupil data?

  2. Any thoughts on my adaptations and if they seem appropriate?

  3. Any tips on how to use or adapt baseline correction (apply_baseline or mne.rescale) to work on my data before I write new functions? I also want to z-score my data, while looking into mne.baseline.rescale I wondered whether there is maybe even a way to do that together

  4. Are my solutions potentially useful to anyone?

Thank you for any input and best wishes,

Anouk Bielefeldt