Dear Qunxi,
if I understand correctly, then the permutation cluster tests in MNE may
not be suitable for what you want to do.
Consider this figure:
http://imgur.com/a/E8ais
This is the time course for a single dipole on the brain (simulated data).
If I understand your goal correctly, you would like to include this vertex
as part of a ROI, because it has increased activity after the stimulus has
been shown.
A cluster test as implemented in MNE would determine whether any
post-stimulus samples are higher than pre-stimulus samples *in a pairwise
fashion*. Thus, whether pre-stimulus sample 1 is higher than post-stimulus
sample 1, pre-stimulus sample 2 is higher than post-stimulus sample 2, etc.
What you most likely want instead is estimate some confidence interval for
the pre-stimulus values in general (red dashed line in the figure) and then
determine, given the post-interval data, whether to include the vertex yes
or no.
It is not surprising that the cluster test marked the entire brain as ROI,
because it is very likely for the time course of a vertex to be higher than
the pre-stimulus at some point, even if the stimulus didn't activate the
vertex at all (and the pre-stimulus and post-stimulus data were drawn from
the same distribution).
At this point, a thresholding operation that only passes vertices for which
the activation surpasses the pre-stimulus activity for a minimum amount of
time makes sense. However, you would need to be careful to set it to a
sensible value.
I think you'll need to implement the procedure to mark the vertices to
include yourself. Then, you can use the stc_to_label function to cut it up
in spatially connected ROIs.
At any rate, I think the result of "showing a stimulus activates the entire
brain" actually makes sense. Showing a stimulus would do that, although not
all parts in the equal amounts.
That's all the help I can give you. Good luck with your study. May your
p-values be significant! 
Kind regards,
Marijn.