Suppress output from mne.compute_covariance

Dear MNE forums,

I’m running a MNE (v0.22) script on a compute cluster (don’t know if that matters) and I get log files with millions of lines that consist basically only of output/logging from mne.compute_covariance() (which is called by mne.decoding.CSP()). I’m setting old_log_level = mne.set_log_level('WARNING', return_old_level=True) at the very top of my script and I do not hand any verbose arguments to my function calls. Any ideas what’s going wrong or how I could stop that?

Advice is very much appreciated. Thanks!

please share a script to replicate

Alex

To make it so you don’t have to share code, if you start from this example and trim it down to the minimum set of lines then anyone can run it:

https://mne.tools/stable/auto_examples/decoding/plot_decoding_csp_eeg.html

Thanks @agramfort and @larsoner for the quick replies.

I tried to reproduce it with code from the example. I do not get (also with my own code) the output when I run it (together with mne.set_log_level('WARNING')) from a jupyter NB. But when I call my code via a bash script on the cluster, it does write it to the logfile.
Unfortunately, I do not have access to the cluster right now, but will try to check in the next days whether also the code from the example produces unwanted output there. I’ll keep you posted.

I thought maybe I’m overlooking something obvious that differs between running it from a NB or having the code called from a bash script. But yea, without code and/or more info it was a bit pointless. sry.

Yeah, Jupyter seems to swallow a bunch of log messages for me too, so this part is kind of expected.

update:
cluster is back after maintenance. and after some back and forth, I was able to reproduce the issue also with the script from the tutorial. However, it seems to be a question regarding parallel processing as the settings of the bash script to distribute it to the cluster make all the difference.

here’s the .py script I use (it’s basically just a c&p from the tutorial combined with the set_log_level('WARNING') command:

csp_demo.py
import numpy as np
import matplotlib.pyplot as plt

import mne
from mne import Epochs, create_info, events_from_annotations
from mne.io import concatenate_raws, read_raw_edf
from mne.datasets import eegbci
from mne.decoding import CSP
from mne.time_frequency import AverageTFR

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import LabelEncoder


old_log_level = mne.set_log_level('WARNING', return_old_level=True)


event_id = dict(hands=2, feet=3)  # motor imagery: hands vs feet
subject = 1
runs = [6, 10, 14]
raw_fnames = eegbci.load_data(subject, runs)
raw = concatenate_raws([read_raw_edf(f) for f in raw_fnames])

# Extract information from the raw file
sfreq = raw.info['sfreq']
events, _ = events_from_annotations(raw, event_id=dict(T1=2, T2=3))
raw.pick_types(meg=False, eeg=True, stim=False, eog=False, exclude='bads')
raw.load_data()

# Assemble the classifier using scikit-learn pipeline
clf = make_pipeline(CSP(n_components=4, reg=None, log=True, norm_trace=False),
					LinearDiscriminantAnalysis())
n_splits = 5  # how many folds to use for cross-validation
cv = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)

# Classification & time-frequency parameters
tmin, tmax = -.200, 2.000
n_cycles = 10.  # how many complete cycles: used to define window size
min_freq = 5.
max_freq = 25.
n_freqs = 8  # how many frequency bins to use

# Assemble list of frequency range tuples
freqs = np.linspace(min_freq, max_freq, n_freqs)  # assemble frequencies
freq_ranges = list(zip(freqs[:-1], freqs[1:]))  # make freqs list of tuples

# Infer window spacing from the max freq and number of cycles to avoid gaps
window_spacing = (n_cycles / np.max(freqs) / 2.)
centered_w_times = np.arange(tmin, tmax, window_spacing)[1:]
n_windows = len(centered_w_times)

# Instantiate label encoder
le = LabelEncoder()

# init scores
freq_scores = np.zeros((n_freqs - 1,))

# Loop through each frequency range of interest
for freq, (fmin, fmax) in enumerate(freq_ranges):

	# Infer window size based on the frequency being used
	w_size = n_cycles / ((fmax + fmin) / 2.)  # in seconds

	# Apply band-pass filter to isolate the specified frequencies
	raw_filter = raw.copy().filter(fmin, fmax, n_jobs=1, fir_design='firwin',
								   skip_by_annotation='edge')

	# Extract epochs from filtered data, padded by window size
	epochs = Epochs(raw_filter, events, event_id, tmin - w_size, tmax + w_size,
					proj=False, baseline=None, preload=True)
	epochs.drop_bad()
	y = le.fit_transform(epochs.events[:, 2])

	X = epochs.get_data()

	# Save mean scores over folds for each frequency and time window
	freq_scores[freq] = np.mean(cross_val_score(estimator=clf, X=X, y=y,
												scoring='roc_auc', cv=cv,
												n_jobs=-2), axis=0)

And here is the bash script that I use to have SLURM distribute it to the cluster:

csp_demo.sh
#!/bin/bash -l 

# specify the indexes of the job array elements
#SBATCH --array=0-1
# Standard output and error: 
#SBATCH -o ./job.out.%j        # Standard output, %A = job ID, %a = job array index 
#SBATCH -e ./job.err.%j        # Standard error, %A = job ID, %a = job array index 
# Initial working directory: 
#SBATCH -D /ptmp/fklotzsche/Experiments/vMemEcc/
# Job Name: 
#SBATCH -J hansi_ftw
# Queue (Partition): 
#SBATCH --partition=general
# Number of nodes and MPI tasks per node: 
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=10
# |

# |
#SBATCH --time=00:05:00

module load anaconda/3
conda activate mne

# Run the program: 
srun python ./vme_analysis/Analyses_Py/csp_demo.py $SLURM_ARRAY_TASK_ID

I found out that if I set --cpus-per-task to something larger than 1, I would get the issue (ie., unwanted output from compute_covariance()). If I just have it run on a single CPU, the output is suppressed.

So I guess it’s related to the multi-core processing. I’m quite a newbie to SLURM and parallelization in general, so I’d be happy about input from someone with more experience if there’s something I can change at my end (apart from using only 1 CPU, which probably makes it quite a bit slower ).
Thanks!