Hi good people of MNE discourse! I’m running some decoding models, which return a series of AUC’s, with one AUC for for each time point reflecting the classifiers ability to discriminate between trial types at each time point. But I need to know for each participant and within each condition, where the significant clusters of high AUC’s start and end. To do this I wrote a cluster based permutation test which works on the probabilities that the classifier assigns to each trial/timepoint (pasted below). It takes a n_trials x time_point array and returns the significant clusters of probabilities. It first finds time points to go into the clusters by looking at those where the p-value is less than .05, and then the permutation test is based on the t-values. It then returns the AUC’s for those significant clusters.
My question has two parts, first, there’s probably a pre-build MNE function that does this right? My brain just isn’t powerful enough to figure out how to do that
And second, how do you compare the CBPT results for subsets of trails with differing sample sizes? The reason that this is not clear to me is that the initial step which includes time points in the permutation procedure is based on the p-value, which is of course affected by the sample size, so If I run the same procedure on a subset of samples within a participant, the probability of there being significant time points that are considered will of course be lower, and this then introduces a bias in my analysis for conditions that have a larger N. Is there an approach to this that doesn’t involve p-values which I can implement in MNE? Many thanks! Below is my function, but I don’t really expect anyone to look at it
def clusterBasedPermutationTest(predictions, y, mean_aucs):
# calculate t and p valus for each time point
res = pd.DataFrame()
for j in range(0, (predictions.shape[1] - 1)):
col = j
tval, pval = stats.ttest_ind(predictions[y == 1001, j], predictions[y == 1002, j], permutations=None,
random_state=None)
res.loc[j, 'sample_point'] = j
res.loc[j, 'time_point'] = (j * 4) - 100
res.loc[j, 'tval'] = tval
res.loc[j, 'pval'] = pval
# find significant differences and create a grouping variable in order to assign a
# label to each cluster
res['less_than_point5'] = np.where(res['pval'] <= .05, True, False)
m = res['less_than_point5'].ne(res['less_than_point5'].shift()).cumsum()
res['cluster_number'] = (m[res['less_than_point5']].groupby(m).ngroup().add(1)
.reindex(res.index, fill_value=False))
# select those clusters that encompass a time span of more than 10 ms
res['cluster_size'] = res.groupby('cluster_number')['cluster_number'].transform('count')
res = res[(res['cluster_size'] >= 3) & res['less_than_point5'] == True]
# tidy
res['sample_point'] = res['sample_point'].astype(int)
res['time_point'] = res['time_point'].astype(int)
res = res.drop(['less_than_point5', 'cluster_size'], axis=1)
# get total t values for each cluster
res['t_sum'] = np.abs(res.groupby('cluster_number')['tval'].transform('sum'))
cluster_labels = list(res['cluster_number'].unique())
# iterate over clusters
cluster_perm_results = []
# for k in np.unique(res['cluster_number']):
for k in cluster_labels:
cluster = res[res['cluster_number'] == k]
cluster_preds = predictions[:, cluster['sample_point']]
# iterate over y permutations
t_sum = []
for l in range(0, 1000):
y_perm = y.copy()
np.random.shuffle(y_perm)
# iterate over sample_points within a cluster
tval_cluster = []
for col in range(0, cluster_preds.shape[1]):
tval, pval = stats.ttest_ind(cluster_preds[y_perm == 1001, col],
cluster_preds[y_perm == 1002, col],
permutations=None, random_state=None)
tval_cluster.append(np.abs(tval))
t_sum.append(np.sum(tval_cluster))
cluster_perm_results.append(t_sum)
actual_cluster_results = res.groupby('cluster_number')['t_sum'].unique().reset_index(drop=False)
actual_cluster_results['t_sum'] = actual_cluster_results['t_sum'].apply(lambda x: x.item())
sig_clusters = []
for t_sum in range(0, len(cluster_perm_results)):
sig_clusters.append(actual_cluster_results['t_sum'][t_sum] > np.percentile(cluster_perm_results[t_sum], 97.5))
sig_clusters = [i for (i, v) in zip(cluster_labels, sig_clusters) if v]
sig_timepoints = np.array(res.loc[res['cluster_number'].isin(sig_clusters), 'sample_point'])
# assign all sampling points that don't form a cluster to False
final_clusters = []
for k in range(0, mean_aucs.shape[0]):
if np.any(sig_timepoints == k):
final_clusters.append(mean_aucs[k])
else:
final_clusters.append(False)
return final_clusters