access icon free Source phone identification using sketches of features

Speech recordings carry useful information for the devices used to capture them. Here, acquisition device identification is studied using ‘sketches of features’ as intrinsic device characteristics. That is, starting from large-size raw feature vectors obtained by either averaging the log-spectrogram of a speech recording along the time axis or stacking the parameters of each component for a Gaussian mixture model modelling the speech recorded by a specific device, features of reduced size are extracted by mapping these raw feature vectors into a low-dimensional space. The mapping preserves the ‘distance properties’ of the raw feature vectors. It is obtained by taking the inner product of the raw feature vector with a vector of independent identically distributed random variables drawn from a p-stable distribution. State-of-the art classifiers, such as a sparse representation-based classifier or support vector machines, applied to the sketches yield an identification accuracy exceeding 94% on a set of eight landline telephone handsets from Lincoln-Labs Handset Database. Perfect identification is reported for a set of 21 cell-phones of various models from seven different brands.

Inspec keywords: signal classification; support vector machines; Gaussian processes; digital forensics; speech processing; mixture models

Other keywords: landline telephone handsets; acquisition device identification; log-spectrogram; speech recording; sketches of features; raw feature vector; sparse representation-based classifier; p-stable distribution; large-size raw feature vectors; Gaussian mixture model; distance property preservation; independent identically distributed random variables; source phone identification; support vector machines

Subjects: Other topics in statistics; Speech and audio signal processing; Speech processing techniques; Other topics in statistics; Knowledge engineering techniques

References

    1. 1)
      • 29. Garofolo, J.: ‘Getting started with the DARPA TIMIT cd-rom: an acoustic phonetic continuous speech database’. Technical Report, National Institute Standards and Technology (NIST), 1988.
    2. 2)
      • 5. Yang, R., Qu, Z., Huang, J.: ‘Detecting digital audio forgeries by checking frame offsets’. Proc. 10th ACM Multimedia and Security Workshop, New York, NY, USA, 2008, pp. 2126.
    3. 3)
      • 17. Kishore, S.P., Yegnanarayanana, B.: ‘Identification of handset type using autoassociative neural networks’. Proc. Fourth Int. Conf. Advances in Pattern Recognition and Digital Techniques, 1999, pp. 353356.
    4. 4)
      • 22. Kotropoulos, C.: ‘Telephone handset identification using sparse representations of spectral feature sketches’. Proc. First Int. Workshop Biometrics and Forensics, Lisbon, Portugal, 2013.
    5. 5)
      • 12. Kraetzer, C., Schott, M., Dittmann, J.: ‘Unweighted fusion in microphone forensics using a decision tree and linear logistic regression models’. Proc. 11th ACM Multimedia and Security Workshop, Princeton, NJ, USA, 2009, pp. 4956.
    6. 6)
    7. 7)
    8. 8)
    9. 9)
    10. 10)
      • 7. Zhou, J., Garcia-Romero, D., Espy-Wilson, C.Y.: ‘Automatic speech codec identification with applications to tampering detection of speech recordings’. Proc. 12th INTERSPEECH, Florence, Italy, 2011, pp. 25332536.
    11. 11)
      • 32. Nolan, J.P.: ‘Stable distributions’ (Birkhauser, 2002).
    12. 12)
    13. 13)
      • 18. Mak, M.-W., Kung, S.-Y.: ‘Combining stochastic feature transformation and handset identification for telephone-based speaker verification’. Proc. 2002 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Orlando, FL, 2002, vol. I, pp. 701704.
    14. 14)
      • 13. Malik, H., Farid, H.: ‘Audio forensics from acoustic reverberation’. Proc. 2010 IEEE Int. Conf. Acoustics Speech and Signal Processing, Dallas, TX, USA, 2010, pp. 17101713.
    15. 15)
    16. 16)
    17. 17)
      • 14. Huang, C.C., Epps, J.: ‘A study of automatic phonetic segmentation for forensic voice comparison’. Proc. 2012 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012, pp. 18531856.
    18. 18)
      • 11. Kraetzer, C., Oermann, A., Dittmann, J., Lang, A.: ‘Digital audio forensics: a first practical evaluation on microphone and environment classification’. Ninth ACM Multimedia and Security Workshop, Dallas, TX, USA, 2007, pp. 6374.
    19. 19)
      • 19. Reynolds, D.A.: ‘HTIMIT and LLHDB: speech corpora for the study of handset transducer effects’. Proc. 1997 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Munich, Germany, 1997, vol. 2, pp. 15351538.
    20. 20)
      • 30. Zolotarev, V.: ‘One dimensional stable distributions’ (Translations of Mathematical Monographs, American Mathematical Society, Providence, RI, USA, 1986), vol. 65.
    21. 21)
      • 20. Panagakis, Y., Kotropoulos, C.: ‘Automatic telephone handset identification by sparse representation of random spectral features’. Proc. 14th ACM Multimedia and Security Workshop, Coventry, UK, 2012, pp. 9195.
    22. 22)
      • 2. Garcia-Romero, D., Espy-Wilson, C.Y.: ‘Automatic acquisition device identification from speech recordings’. Proc. 2010 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Dallas, TX, USA, 2010, pp. 18061809.
    23. 23)
    24. 24)
    25. 25)
      • 34. Otero, D., Arce, G.R.: ‘Generalized restricted isometry property for alpha-stable random projections’. Proc. 2011 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Prague, The Czech Republic, 2011, pp. 36763679.
    26. 26)
    27. 27)
      • 31. Arce, G.R.: ‘Nonlinear signal processing’ (John Wiley & Sons, Hoboken, NJ, USA, 2005).
    28. 28)
      • 8. Jenner, F., Kwasinski, A.: ‘Highly accurate non-intrusive speech forensics for codec identifications from observed decoded signals’. Proc. 2012 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012, pp. 17371740.
    29. 29)
      • 9. Sharma, D., Naylor, P.A., Gaubitch, N.D., Brookes, M.: ‘Non intrusive codec identification algorithm’. Proc. 2012 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012, pp. 44774480.
    30. 30)
      • 10. Oermann, A., Lang, A., Dittmann, J.: ‘Verifier-tuple for audio-forensic to determine speaker environment’. Proc. Seventh ACM Multimedia and Security Workshop, New York, NY, USA, 2005, pp. 5762.
    31. 31)
      • 27. Vapnik, V.: ‘Statistical learning theory’ (John Wiley & Sons, New York, NY, USA, 1998).
    32. 32)
    33. 33)
      • 21. Panagakis, Y., Kotropoulos, C.: ‘Telephone handset identification by feature selection and sparse representations’. Proc. 2012 IEEE Int. Workshop Information Forensics and Security, Tenerife, Spain, 2012, pp. 7378.
    34. 34)
    35. 35)
    36. 36)
      • 6. Luo, D., Luo, W., Yang, R., Huang, J.: ‘Compression history identification for digital audio signal’. Proc. 2012 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012, pp. 17331736.
    37. 37)
    38. 38)
      • 29. Garofolo, J.: ‘Getting started with the DARPA TIMIT cd-rom: an acoustic phonetic continuous speech database’. Technical Report, National Institute Standards and Technology (NIST), 1988.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-bmt.2013.0056
Loading

Related content

content/journals/10.1049/iet-bmt.2013.0056
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading