I think I can answer at least some of your questions:
With that function clustering is always done on the last dimension (for 2-dimensional X) or the last two dimensions (for 3-dimensional X). So in your case (3-dimensional X, n_subjs x n_freqs x n_sensors) it will cluster spectro-spatially. The exception is if your adjacency
or max_step
arguments constrain the clusters from ever forming in one of those dimensions. For example, maybe you don’t want to consider adjacent frequency bins to be “connected” for the purpose of clustering (i.e., if bin spacing were 5 Hz maybe it does not make sense to consider the 5 Hz and 10 Hz bins to potentially form a cluster). In that case you could pass max_step=0
and pass to adjacency
an n_chan x n_chan
matrix defining spatial adjacency of the sensor positions. If you do that, then your resulting clusters will always be spatial clusters and never spatio-spectral clusters because each cluster will never be allowed to span across multiple frequency bins. In that case any potential spatial clusters will be evaluated separately at each frequency bin.
Hopefully @agramfort or someone else will chime in with answers to the other parts of your question.