Cropping data related to specific events

I am trying to work on Sleep Staging with some EDF file. I want to filter out parts related to sleep events. To some extent, I did it using the following code.

SLEEP_STAGE_NAMES = ['Sleep stage W','Sleep stage N1', 'Sleep stage N2', 'Sleep stage N3', 'Sleep stage R']

raw = mne.io.read_raw_edf(edf_file, preload=True)
annotations = raw.annotations

# Select only sleep-related segments
sleep_segments = [ann for ann in annotations if ann['description'] in SLEEP_STAGE_NAMES]


# Extract start and stop times for sleep stages
start_times = np.array([float(ann['onset']) for ann in sleep_segments])
durations = np.array([float(ann['duration']) for ann in sleep_segments])
stop_times = start_times + durations

# Ensure stop_times does not exceed max recording time
max_time = raw.times[-1]
stop_times = np.clip(stop_times, None, max_time)

# Crop the raw data to the sleep period
sleep_raw = raw.copy().crop(tmin=start_times[0], tmax=stop_times[-1])

So, this gave me the parts related to sleep. Now, I am trying to manually get segments of 30s windows in the data and get the annotated sleep stages for each window. FYI, so that I can compare it with YASA prediction. This following code I wrote for this:

# Convert data to NumPy array
data = sleep_raw.get_data()
sfreq = sleep_raw.info['sfreq']  # Sampling frequency
total_duration = data.shape[1] / sfreq  # Total duration in seconds

# Divide into 30-second epochs
epoch_length = 30  # seconds
num_epochs = int(total_duration // epoch_length)
epoch_times = np.arange(0, num_epochs * epoch_length, epoch_length)

# Assign sleep stages to epochs
epoch_stages = []
for start in epoch_times:
    matching_ann = [
        ann['description'] for ann in sleep_segments 
        if (start < ann['onset'] + ann['duration']) and (start + epoch_length > ann['onset'])
    ]
    if matching_ann:
        epoch_stages.append(matching_ann[0])  # Use the first matched stage
    else:
        epoch_stages.append("Unknown")  # No overlapping annotation found

This should theoretically give me what I wanted, but Iā€™m getting at least 25% epochs labeled as unknown.
Is there any possible reason or solution? Or should I change my approach?

  • MNE version: 1.9.0
  • operating system: windows 11