Dealing with Packet Loss in Emotiv EEG Data

Accounting for Signal Packet Loss in Wireless EEG Data

When working with wireless EEG devices (e.g. Emotiv headsets) in MNE-Python, one issue that may arise is signal packet loss. This issue is specific to wireless devices. Packet loss can introduce invisible timing errors that disrupt the accuracy of event marking and compromise the integrity of analyses.

It is important to remove any environmental interference that can cause packet loss before data collection. However you may still want to analyze data where packet loss could not be avoided entirely. This post briefly describes how to check for packet loss and how to adjust the event marker timestamps to be used for MNE python analysis.

1. Why Packet Loss Happens (and Why MNE Doesn’t Automatically Handle It)

Wireless EEG systems transmit data over Bluetooth or WiFi. Unlike wired connections, wireless links are prone to data packet loss due to latency spikes, interference, or buffer overflows. Some systems (like Emotiv) attempt to maintain a continuous data stream by duplicating previous packets or silently dropping them.

MNE-Python, by design, assumes a perfect time series sampled at a fixed rate (which depends on the sampling rate specified in the metadata of .edf files). It does not have built-in mechanisms to detect or correct dropped or duplicated packets as its not typical of wired EEG data and is device-specific and often hidden from view in the final .edf/.fif files.

The best solution i.e. modifying the raw.times object with the real latencies accounting for packet loss is not limited with MNE (as raw is read only). So it’s important to check if packet loss is present in your data before analyzing with a library like MNE python.

2. How to Detect Packet Loss in Emotiv Data

When working with Emotiv EEG data, packet loss can be detected using multiple indicators.

:small_blue_diamond: A. COUNTER Channel

Emotiv devices include a COUNTER channel that increments with each EEG packet and wraps around at the sample rate of the data (i.e. 128 or 256). You can use this to detect missed or duplicated samples

:small_blue_diamond: B. Interpolations and SamplerateQuality Column

If your EEG datafile includes EEG.Interpolated & EQ.SampleRateQuality columns. Sum the columns values to check if any column value is greater than 0 which indicates that the system filled in missing data (i.e., packet loss occurred).

:small_blue_diamond: C. Timestamp Regularity

You can alternatively check if timestamps are evenly spaced depending on the sample rate. Wireless packet loss usually causes small jumps or irregular gaps in time:

3. How to Account for Packet Loss for MNE Python analysis

Our recommendation to account for packet loss during analysis is to modify Event Marker latency column which MNE python uses to mark events. I.e. predict where the event marker should occur in the MNE raw.times.

a) Compute corrected timestamps for each sample as MNE would. I.e. from the first timestamp in the data construct a time series using the sampling rate of the data

b) Update the event onsets in the raw.annotations or events array to align with the corrected latency.

Below is an example code which can be used to load any event marker .csv file:

  1. Export the Emotiv EEG and interval marker file as .csv into two folders
  2. In the python script, specify subject_id found in your filename
  3. Check that the event sampling rate for your data is correct
  4. The script will output a corrected event marker .csv file which can be used for data analysis
import os
import pandas as pd
import numpy as np

# Sampling frequency (Hz)
sfreq = 128  # MNE sampling rate (Check the sampling rate of your data!!)

# List of subject IDs
subject_ids = ["DRS-46"]

# List of subject IDs that do NOT have metadata rows
no_metadata_subjects = ["DRS-55", "DRS-534"]  # Add the IDs of subjects without metadata rows

# Data folders
data_folder = "data/csv"  # Replace with your EEG data folder path
marker_folder = "data/marker_raw"  # Replace with your marker folder path

# Process each subject
for subject_id in subject_ids:
    # Locate the EEG file and marker file dynamically
    eeg_file = next((f for f in os.listdir(data_folder) if subject_id in f and f.endswith(".csv")), None)
    marker_file = next((f for f in os.listdir(marker_folder) if subject_id in f and f.endswith(".csv")), None)

    if not eeg_file or not marker_file:
        print(f"Files not found for subject {subject_id}")
        continue

    eeg_file_path = os.path.join(data_folder, eeg_file)
    marker_file_path = os.path.join(marker_folder, marker_file)

    print(f"Processing subject {subject_id}:")
    print(f"  EEG file: {eeg_file_path}")
    print(f"  Marker file: {marker_file_path}")

    # Step 1: Load EEG file and calculate latencies
    if subject_id in no_metadata_subjects:
        eeg_df = pd.read_csv(eeg_file_path)  # Load without skipping any rows
    else:
        eeg_df = pd.read_csv(eeg_file_path, skiprows=1)  # Skip the first row

    # Calculate latency (relative time in seconds)
    eeg_df['Latency'] = (eeg_df['Timestamp'] - eeg_df['Timestamp'].iloc[0])

    # Calculate mne_latency based purely on sampling frequency
    eeg_df['mne_latency'] = np.arange(len(eeg_df)) / sfreq

    # Step 2: Load marker file
    marker_df = pd.read_csv(marker_file_path)

    # Step 3: Compute the adjusted latency for packet loss data
    def find_adjusted_latency(latency, latency_col, mne_latency_col, max_difference=2):
        idx = (np.abs(latency_col - latency)).idxmin()  # Find closest row index
        closest_latency = latency_col[idx]  # Get the actual closest latency value
        if np.abs(closest_latency - latency) > max_difference:  # Check if the difference is within 2 seconds
            return np.nan  # Assign NaN if the difference is too large
        return mne_latency_col[idx]  # Otherwise, return the corresponding mne_latency

    marker_df['starttime_adjusted_latency'] = marker_df['starttime_latency'].apply(
        lambda x: find_adjusted_latency(x, eeg_df['Latency'], eeg_df['mne_latency'])
    )

    # Step 4: Add endtime_adjusted_latency
    marker_df['endtime_adjusted_latency'] = marker_df['endtime_latency'].apply(
        lambda x: find_adjusted_latency(x, eeg_df['Latency'], eeg_df['mne_latency'])
    )

    # Step 5: Calculate the number of samples and adjusted_duration
    def count_samples_from_latencies(start_latency, end_latency, mne_latency_col, sfreq):
        # Handle NaN values for start or end latencies
        if pd.isna(start_latency) or pd.isna(end_latency):
            return np.nan
        # Find the indices corresponding to start and end latencies
        start_idx = (np.abs(mne_latency_col - start_latency)).idxmin()
        end_idx = (np.abs(mne_latency_col - end_latency)).idxmin()
        # Count samples as the difference between indices
        return end_idx - start_idx

    marker_df['num_samples'] = marker_df.apply(
        lambda row: count_samples_from_latencies(
            row['starttime_adjusted_latency'], 
            row['endtime_adjusted_latency'], 
            eeg_df['mne_latency'], 
            sfreq
        ), 
        axis=1
    )

    marker_df['adjusted_duration'] = (marker_df['num_samples'] / sfreq).astype(int)

    # Step 6: Output as subjectID_adjusted_marker.csv
    adjusted_marker_file = os.path.join(marker_folder, f"{subject_id}_adjusted_intervalMarker.csv")
    marker_df.to_csv(adjusted_marker_file, index=False)

    print(f"  Adjusted marker file saved to {adjusted_marker_file}")

print("Processing complete.")

Alternative solutions and suggestions are welcome!