However, in our experience, the spatial correlation issue is less likely to occur for high frequency responses in general, since only a few milliseconds of delay between the left and right hemispheres is enough to substantially reduce the spatial correlation between them and allow the beamformer to successfully separate the sources.
Furthermore, back in 2006, it was common to perform beamforming based on the sample covariance of the averaged evoked responses. As it turns out, this exacerbated the problem of bilateral correlated sources. Since then, most beamformer implementations (including FieldTrip, Nutmeg, and MNE-Python) compute the sample covariance using unaveraged trials by default; the trial-to-trial correlation between left and right auditory cortex is usually not nearly as high as it is in the average, which allows the sources to be successfully separated. Even with standard auditory evoked responses, I've found that the cancellation is much less likely to occur when the sample covariance is computed this way.
Although I haven't tried DICS for auditory responses, we've had great results with localizing high gamma band auditory MEG responses using time domain beamforming together with the Hilbert transform for the frequencies of interest. High gamma band activity seems to be generally more difficult to resolve with EEG; that might be due to head model quality or simply lower sensitivity for the particular auditory cortex sources. To maximize EEG head model quality, I'd recommend using digitized electrode positions and BEM head models generated from segmented individual MRIs.
Hope that helps, and I'd be happy to discuss further.