http://iet.metastore.ingenta.com
1887

Fast speaker clustering using distance of feature matrix mean and adaptive convergence threshold

Fast speaker clustering using distance of feature matrix mean and adaptive convergence threshold

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Signal Processing — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

The authors propose a method of fast speaker clustering in which a distance (distance of feature matrix mean, DFMM) is first defined for characterising the similarities between any two clusters, and then an adaptive convergence threshold is introduced for terminating the procedure of speaker clustering. If the minimum of the DFMMs between any two clusters is smaller than the threshold, then they are merged. The above mergence of clusters is repeated until the minimum of the DFMMs between any two clusters is larger than the threshold. They conduct experiments on both shorter voice segments (≤ 3 s) and longer voice segments (> 3 s) to compare their method with state-of-the-art methods, agglomerative hierarchical clustering with Bayesian information criterion (AHC + BIC) and vector quantisation with spectral clustering. Experiments show that their method achieves the best results for clustering shorter voice segments, and also obtains satisfactory results for clustering longer voice segments in comparison with other two methods. What is more, their method is faster than other methods in all experimental cases. The initial results show that the hybrid methods by combining their method with the AHC + BIC obtain further improvement in terms of the F score.

References

    1. 1)
    2. 2)
    3. 3)
    4. 4)
    5. 5)
      • 5. Solomonoff, A., Mielke, A., Schmidt, M., Gish, H.: ‘Clustering speakers by their voices’. Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 1998, vol. 2, pp. 757760.
    6. 6)
      • 6. Ajmera, J., Bourlard, H., Lapidot, I., McCowan, I.: ‘Unknown-multiple speaker clustering using HMM’. Proc. Int. Conf. Spoken Language Processing, 2002, pp. 573576.
    7. 7)
    8. 8)
      • 8. Iso, K.-I.: ‘Speaker clustering using vector quantization and spectral clustering’. Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 2010, pp. 49864989.
    9. 9)
      • 9. Valente, F., Motlicek, P., Vijayasenan, D.: ‘Variational Bayesian speaker diarization of meeting recordings’. Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 2010, pp. 49544957.
    10. 10)
      • 10. Han, K.J., Kim, S., Narayanan, S.S.: ‘Robust speaker clustering strategies to data source variation for improved speaker diarization’. Proc. IEEE Automatic Speech Recognition and Understanding Workshop, 2007, pp. 262267.
    11. 11)
      • 11. ‘Chinese Linguistic Data Consortium, http://www.chineseldc.org/en/index.htm.
    12. 12)
      • 12. Sun, X.J.: ‘Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio’. Proc. IEEE Int. Conf. Acoustic, Speech and Signal Processing, 2002, pp. 333336.
    13. 13)
      • 13. Brookes, M.: Voicebox 1.15, Department of Electrical & Electronic Engineering, Imperial College, 2007.
    14. 14)
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2013.0340
Loading

Related content

content/journals/10.1049/iet-spr.2013.0340
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address