Problem with float arithmetic -- creating epochs of same time

  • MNE-Python version: 0.23.0
  • operating system: Windows 10

As briefly as possible, here is my problem:
Issue:
I can not concatenate or combine_evoked my data due to difference in time instants (?)

Explanation of work:

  • EEG data from multiple subjects. Each has a marker, but I want to epochs for pre-marker and post-marker.
  • Pre-marker epoch is fine: epoch tmin = -0.2, epoch tmax = constant (for all subjects). Post-marker epoch is not fine: epoch tmin = positive variable time, epoch tmax = tmin + a constant.

In both cases, the duration of the epoch is designed to be the same within the group. The problem that I’ve noticed is that when I provide the variable time and constant value for the post-marker tmin and tmax, it performs float arithmetic to calculate the times and values like 3.99999999999995 are created as well as values like 4.0.

This wasn’t a problem for me until I tried to concatenate the data to do some initial group analyses on it. I tried to fix this by converting the values from floats to decimals (see: https://www.pythontutorial.net/advanced-python/python-decimal/) but mne throws an error unless the value is a float type.

A simple illustrative code:

tmin = 0.575
tmax = 4.055

raw = mne.io.read_raw_brainvision(vhdr_fname=eeg_file)

events_from_annot, event_dict = mne.events_from_annotations(raw_data)

epoch = mne.Epochs(raw_data,
            events_from_annot,
            event_dict['marker1'],
            tmin=tmin,
            tmax=tmax,
            baseline=(None, None), 
            preload=True)

print(epoch.tmax - epoch.tmin)
#3.4799999999999995
#if you use tmin=0.650 and tmax=4.13 instead, it returns 3.48
#the tmax of both of these examples are just tmin+constant
#and for almost 1/5th of all my data samples, it returns a value like 3.47999999999995 instead of 3.48
#therefore, when trying to concatenate or combine_evoked these epochs together, it throws an error saying the times are not the same.

So, is there some way I can easily continue with my analyses even though this values do not match exactly? Any help would be greatly appreciated.

Thanks!

Can you show the error when concatenating subjects?

The error message when concatenating:

A check of the tmax-tmin for the offline epoch groups (which throw the errors):

The concatenate works for the online epoch groups, and they all have the exact same tmax-tmin value.

Thanks. I wonder if the Epochs objects actually consist of the same number of data points. Can you show the output of epochs.get_data().shape for both “groups” (as defined by their durations in seconds)?

There appears to be various differences in the shapes of the epochs

output of epochs.get_data().shape #for 1 set of the ‘offline’ group.

(72, 63, 3476)
(72, 63, 3481)
(72, 63, 3481)
(72, 63, 3481)
(72, 63, 3481)
(72, 63, 3481)
(72, 63, 3481)
(72, 63, 3481)
(72, 63, 3481)
(72, 63, 3481)
(72, 63, 3481)
(72, 63, 3481)
(72, 63, 3481)
(72, 63, 3481)
(72, 63, 3481)
(72, 63, 3481)
(72, 63, 3481)
(72, 63, 3481)
(60, 63, 3481)
(65, 63, 3481)
(64, 63, 3481)
(64, 63, 3481)
(64, 63, 3481)
(64, 63, 3481)
(64, 63, 3481)
(65, 63, 3481)
(64, 63, 3481)
(64, 63, 3481)
(65, 63, 3481)
(64, 63, 3481)

It’s unlikely then that the problem is caused by different durations (since floats are in general not represented exactly and they seem to be identical up to machine precision). Instead, all Epochs objects must have matching numbers of channels and time points. Clearly, the first object in your list has fewer time points, so I suggest you try to find out what’s going on with this particular data set. How do you create these Epochs objects?

1 Like

I tried excluding the first epoch from the group, but it still failed to concatenate. It gave the same error message as before:

The epoching process is a little complicate:
(1) I manually investigate the raw files to find the “starting location” that does NOT include the marker. The marker has a set starting point that is shared in all the raw files, but it has a varying ending point in each of the raw files.
(2) I then manually investigated the raw files again to determine the minimum duration of all of these raw files to find when the marker so that the marker will not be present in any of the epochs that are made (I use the minimum duration to allow all of the epochs to have the same duration: tmin + minimum_length)
(3) I then generate the epochs using a loop which accesses the tmin values from a txt as well as the minimum_length (a constant) stored in the txt as well.

I used this procedure for all four of my groups. It could be that something went wrong, but I would have thought that it would have made all of my groups not function correctly. Anyways, I can check out that first epoch–remake it and see if it helps.

If you have any more info or advice in the meantime, please let me know.

Thanks!

Some of the code used to generate the offline epochs:

	tmin = float(txt_values[x][1]) #location where stimulation signals end
	tmax = float(tmin + minimum_length) #all epochs have the same total length

...
...
...

	epoch = mne.Epochs(raw_data,
			events_from_annot,
			event_dict['marker'],
			tmin=tmin,
			tmax=tmax,
			baseline=(tmax-0.1, tmax), #the last .1s of the epoch (can't use -0.1 - 0.0 because that is the stimulation event in this case)
			preload=True)
	


A snippet of the txt with the data that is grabbed:
(only the starting_point value and the min length value is used in the included snippet below)

Subject_Number, Starting_Point, Ending_Point, Total_Length
[‘01’, ‘0.650’, ‘4.225’, 3.575]
[‘02’, ‘0.575’, ‘4.225’, 3.65]



min length: 3.480 | max length: 3.600 | maximum length loss: 0.120

Let us know if you find anything. I would include assert statements before concatenating to make sure all Epochs objects are exactly the same length.

I’ve been doing a little investigating and I’ve been able to get two separate error messages that appear to be based on the times of the epochs.

Not sure if it helps, but at least I know that when I purposely make large modifications to the times variable, it sees that.

So, I thought maybe there was a better way to solve this problem.

Instead of me asking “how do I fix this?” Perhaps instead I can ask “how would someone go about doing what I wanted to originally do”? That way, I can try the proposed method and if I still have the same issue, then there may be a better chance that we can isolate what is going on and fix it.

With that in mind, could you please explain your method to do the following:

  • Data Constraints: You have n amounts of raw eeg data files that have a marker for the starting point of a stimulation event, but each of these n raw eeg data files have unique ending points of that stimulation event (for example, the stimulation events may last in duration between 500ms-800ms),

  • Proposed Question: What mne functions would you use to split those n amounts of raw eeg data sets into 2 separate groups of epoch objects of n amounts per group such that the epochs in Group 1 had lengths of A and those in Group 2 had lengths of B such that each Group could be concatenated for average analyses?

  • Output Constraints: Group 1 epochs must include the stimulation event, from stimulation start to stimulation stop; however, a baseline period is acceptable as well as extra eeg data post-stimulation event. Group 2 epochs must begin at the end of the stimulation event and continue for, lets say, three seconds.

My method was to manually find the time that the stimulation events ended (stimulation_end) by checking each raw plot individually and recording the appropriate time corresponding to each eeg data file. From those values, I also found the max_stimulation_time as well. I then referenced that information in a loop containing two lines of mne.Epochs() per raw data file which used differing tmax and tmin values based on the values I observed above: namely, for Group 1, I used tmin=-0.2, tmax=max_stimulation_time and for Group 2, I used tmin=stimulation_end, tmax=stimulation_end+3.0). I did not use crop() or any other mne functions related to generating epochs aside from mne.Epochs(). Please note that my original concern was that the stimulation_end+value calculation was what I thought was causing the error message to present when attempting to concatenate due to float arithmetic calculations.

Also, I think it might be important to add here that, using my code and method, for Group 1, that is based on a static value of -0.2 for tmin and a static value of max_stimulation_end, the concatenation function works as intended without generating any errors.

What are the contents of epochs.times, does it differ across epochs? You can compare via np.allclose(epochs_a.times, epochs_b.times).

It’s not possible to concatenate epochs with different numbers of sample points, so it’s no use trying.

  • Data Constraints: You have n amounts of raw eeg data files that have a marker for the starting point of a stimulation event, but each of these n raw eeg data files have unique ending points of that stimulation event (for example, the stimulation events may last in duration between 500ms-800ms),

So you have several files that contain continuous EEG data, each of which contain just one event type (stimulation onset)?

  • Proposed Question: What mne functions would you use to split those n amounts of raw eeg data sets into 2 separate groups of epoch objects of n amounts per group such that the epochs in Group 1 had lengths of A and those in Group 2 had lengths of B such that each Group could be concatenated for average analyses?

Sounds like mne.Epochs() is the way to go.

  • Output Constraints: Group 1 epochs must include the stimulation event, from stimulation start to stimulation stop; however, a baseline period is acceptable as well as extra eeg data post-stimulation event. Group 2 epochs must begin at the end of the stimulation event and continue for, lets say, three seconds.

I probably didn’t fully understand your data layout, but this sounds like you could generate epochs for both groups. Both groups each have constant epoch lengths, right? So for group 1 you can specify tmin and tmax relative to the stimulation onset events, e.g. tmin=-0.2 and tmax=3. If group 2 epochs should start when group 1 epochs end, you could use tmin=3 and tmax=6.

But again, I am probably missing something.

Richard, thank you for your input.

What are the contents of epochs.times , does it differ across epochs? You can compare via np.allclose(epochs_a.times, epochs_b.times) .

I just checked np.allclose(a,b) and some are True and some are False. I printed out a comparison of all 30 epochs compared to each other (and itself). Since they are not ALL False, it makes me think again of how some of my length durations had values similar to (as an example) 3.0 and 2.999999999999994.

Format: ‘epoch, compared epoch, value_of_np.allclose()’ (only printed when value == True)

[[‘0,0,True’, ‘0,18,True’],
[‘1,1,True’, ‘1,10,True’, ‘1,17,True’, ‘1,24,True’],
[‘2,2,True’, ‘2,8,True’, ‘2,11,True’, ‘2,14,True’, ‘2,25,True’, ‘2,28,True’],
[‘3,3,True’, ‘3,13,True’, ‘3,15,True’, ‘3,20,True’, ‘3,22,True’, ‘3,26,True’],
[‘4,4,True’],
[‘5,5,True’, ‘5,7,True’, ‘5,16,True’, ‘5,21,True’],
[‘6,6,True’, ‘6,23,True’],
[‘7,5,True’, ‘7,7,True’, ‘7,16,True’, ‘7,21,True’],
[‘8,2,True’, ‘8,8,True’, ‘8,11,True’, ‘8,14,True’, ‘8,25,True’, ‘8,28,True’],
[‘9,9,True’],
[‘10,1,True’, ‘10,10,True’, ‘10,17,True’, ‘10,24,True’],
[‘11,2,True’, ‘11,8,True’, ‘11,11,True’, ‘11,14,True’, ‘11,25,True’, ‘11,28,True’],
[‘12,12,True’],
[‘13,3,True’, ‘13,13,True’, ‘13,15,True’, ‘13,20,True’, ‘13,22,True’, ‘13,26,True’],
[‘14,2,True’, ‘14,8,True’, ‘14,11,True’, ‘14,14,True’, ‘14,25,True’, ‘14,28,True’],
[‘15,3,True’, ‘15,13,True’, ‘15,15,True’, ‘15,20,True’, ‘15,22,True’, ‘15,26,True’],
[‘16,5,True’, ‘16,7,True’, ‘16,16,True’, ‘16,21,True’],
[‘17,1,True’, ‘17,10,True’, ‘17,17,True’, ‘17,24,True’],
[‘18,0,True’, ‘18,18,True’],
[‘19,19,True’],
[‘20,3,True’, ‘20,13,True’, ‘20,15,True’, ‘20,20,True’, ‘20,22,True’, ‘20,26,True’],
[‘21,5,True’, ‘21,7,True’, ‘21,16,True’, ‘21,21,True’],
[‘22,3,True’, ‘22,13,True’, ‘22,15,True’, ‘22,20,True’, ‘22,22,True’, ‘22,26,True’],
[‘23,6,True’, ‘23,23,True’],
[‘24,1,True’, ‘24,10,True’, ‘24,17,True’, ‘24,24,True’],
[‘25,2,True’, ‘25,8,True’, ‘25,11,True’, ‘25,14,True’, ‘25,25,True’, ‘25,28,True’],
[‘26,3,True’, ‘26,13,True’, ‘26,15,True’, ‘26,20,True’, ‘26,22,True’, ‘26,26,True’],
[‘27,27,True’],
[‘28,2,True’, ‘28,8,True’, ‘28,11,True’, ‘28,14,True’, ‘28,25,True’, ‘28,28,True’],
[‘29,29,True’]]

So you have several files that contain continuous EEG data, each of which contain just one event type (stimulation onset)?

Correct.

Sounds like mne.Epochs() is the way to go.

I was wondering if crop() would have been a preferred method instead. I didn’t test it, just went with mne.Epochs() because I didn’t assume that I would have any issues with it.

I probably didn’t fully understand your data layout, but this sounds like you could generate epochs for both groups. Both groups each have constant epoch lengths, right? So for group 1 you can specify tmin and tmax relative to the stimulation onset events, e.g. tmin=-0.2 and tmax=3 . If group 2 epochs should start when group 1 epochs end, you could use tmin=3 and tmax=6 .

Almost.

The epoch length of Group 1 does not need to equal the epoch length of Group 2. In fact, by design they should not. Group 1 should consist of a length that captures the varying stimulation event, therefore a length not less than the maximum stimulation event length. This would make all of Group 1 have the same length and ensure that all epochs captured the full duration of the stimulation event.

For Group 2, the main thing is that the epochs need to start at the end of the stimulation event. The stimulation event length varies, so each of the epochs in Group 2 need to have varying starting points (tmin_varying). To make Group 2 epochs comparable to each other, they need to have the same length (length_duration). So, I proposed using tmin = tmin_varying and tmax = tmin_varying + a_constant_value for each epoch in Group 2. Thus, if Group 2/Epoch1 had a tmin_varying of 4 and Group2/Epoch2 had a tmin_Varying of 5, they would both still have the same length_duration as the tmax would just add a constant value to the tmin_varying value.

As per the above example, if constant_value is 3:
Group1/Epoch1: tmin = 4, tmax = 4+3 (tmin+constant_value); <–length_duration: 3 (tmax-tmin)
Group1/Epoch2: tmin = 5, tmax = 5+3 (tmin+constant_value); <–length_duration: 3 (tmax-tmin)

You need to fix that (i.e., ensure all epochs have the exact same length in terms of sample points AND have the exact same time points associated with the sample points), otherwise you won’t be able to concatenate those epochs.

OK. I think I might have a solution:
https://mne.tools/stable/generated/mne.Epochs.html?highlight=shift_time#mne.Epochs.shift_time

After epoching, I’ll shift the starting time to be 0.0 for all epochs and see if that works. I’ll update with the results. Thanks @richard and @cbrnr for your help so far.

Nice. It worked!

So, here’s the solution:
(1) Epoch however you want as long as the total length of all of the epochs are the same
(2) Perform time shift on all of the epochs: mne.epochs.shift_time(tshift=0.0, relative=False)
(3) Apply baseline correction to time shifted epochs: mne.epochs.apply_baseline((None,None))
(4) Concatenate: mne.epochs.concatenate_epochs()

The baseline needs to be within the epochs and it needs to be the same baseline time point. So, (None,None) uses the full duration of the epoch, and performing the baseline after the time shift makes the baseline time equal for all epochs.

1 Like

But you didn’t use tshift=0.0? I still don’t quite understand it, but it’s good you found a solution!

But you didn’t use tshift=0.0 ? I still don’t quite understand it, but it’s good you found a solution!

No, originally I did not perform any time shifts. The solution was to time shift and make sure that the baseline was done after the time shift Without the time shift, the concatenation would fail. If the epochs were time shifted, but the baseline did not match, the concatenation would fail. Hope that helps if anyone else runs into similar issues.