Speech recordings carry useful information for the devices used to capture them. Here, acquisition device identification is studied using ‘sketches of features’ as intrinsic device characteristics. That is, starting from large-size raw feature vectors obtained by either averaging the log-spectrogram of a speech recording along the time axis or stacking the parameters of each component for a Gaussian mixture model modelling the speech recorded by a specific device, features of reduced size are extracted by mapping these raw feature vectors into a low-dimensional space. The mapping preserves the ‘distance properties’ of the raw feature vectors. It is obtained by taking the inner product of the raw feature vector with a vector of independent identically distributed random variables drawn from a p-stable distribution. State-of-the art classifiers, such as a sparse representation-based classifier or support vector machines, applied to the sketches yield an identification accuracy exceeding 94% on a set of eight landline telephone handsets from Lincoln-Labs Handset Database. Perfect identification is reported for a set of 21 cell-phones of various models from seven different brands.

References

1. 1)
  - 29. Garofolo, J.: ‘Getting started with the DARPA TIMIT cd-rom: an acoustic phonetic continuous speech database’. Technical Report, National Institute Standards and Technology (NIST), 1988.
2. 2)
  - 5. Yang, R., Qu, Z., Huang, J.: ‘Detecting digital audio forgeries by checking frame offsets’. Proc. 10th ACM Multimedia and Security Workshop, New York, NY, USA, 2008, pp. 21–26.
3. 3)
  - 17. Kishore, S.P., Yegnanarayanana, B.: ‘Identification of handset type using autoassociative neural networks’. Proc. Fourth Int. Conf. Advances in Pattern Recognition and Digital Techniques, 1999, pp. 353–356.
4. 4)
  - 22. Kotropoulos, C.: ‘Telephone handset identification using sparse representations of spectral feature sketches’. Proc. First Int. Workshop Biometrics and Forensics, Lisbon, Portugal, 2013.
5. 5)
  - 12. Kraetzer, C., Schott, M., Dittmann, J.: ‘Unweighted fusion in microphone forensics using a decision tree and linear logistic regression models’. Proc. 11th ACM Multimedia and Security Workshop, Princeton, NJ, USA, 2009, pp. 49–56.
6. 6)
  - D.A. Reynolds , T.F. Quartieri , R.B. Dunn . Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. , 19 - 41
7. 7)
  - 28. Chang, C.-C., Lin, C.-J.: ‘LIBSVM: a library for support vector machines’, ACM Trans. Intell. Syst. Technol., 2011, 2, (3), pp. 1–27 (doi: 10.1145/1961189.1961199).
8. 8)
  - 16. Malik, H.: ‘Acoustic environment identification and its applications to audio forensics’, IEEE Trans. Inf. Forensics Sec., 2013, 8, (11), pp. 1827–1837 (doi: 10.1109/TIFS.2013.2280888).
9. 9)
  - 4. Maher, R.: ‘Audio forensic examination’, IEEE Signal Process. Mag., 2009, 26, (2), pp. 84–94 (doi: 10.1109/MSP.2008.931080).
10. 10)
  - 7. Zhou, J., Garcia-Romero, D., Espy-Wilson, C.Y.: ‘Automatic speech codec identification with applications to tampering detection of speech recordings’. Proc. 12th INTERSPEECH, Florence, Italy, 2011, pp. 2533–2536.
11. 11)
  - 32. Nolan, J.P.: ‘Stable distributions’ (Birkhauser, 2002).
12. 12)
  - W.M. Campbell , D.E. Sturim , D.A. Reynolds . Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. , 5 , 308 - 311
13. 13)
  - 18. Mak, M.-W., Kung, S.-Y.: ‘Combining stochastic feature transformation and handset identification for telephone-based speaker verification’. Proc. 2002 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Orlando, FL, 2002, vol. I, pp. 701–704.
14. 14)
  - 13. Malik, H., Farid, H.: ‘Audio forensics from acoustic reverberation’. Proc. 2010 IEEE Int. Conf. Acoustics Speech and Signal Processing, Dallas, TX, USA, 2010, pp. 1710–1713.
15. 15)
  - 3. Hanilci, C., Ertas, F., Ertas, T., Eskidere, O.: ‘Recognition of brand and models of cell-phones from recorded speech signals’, IEEE Trans. Inf. Forensics Sec., 2012, 7, (2), pp. 625–634 (doi: 10.1109/TIFS.2011.2178403).
16. 16)
  - 1. Farid, H.: ‘Digital image forensics’, Sci. Am., 2008, 6, (298), pp. 66–71 (doi: 10.1038/scientificamerican0608-66).
17. 17)
  - 14. Huang, C.C., Epps, J.: ‘A study of automatic phonetic segmentation for forensic voice comparison’. Proc. 2012 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012, pp. 1853–1856.
18. 18)
  - 11. Kraetzer, C., Oermann, A., Dittmann, J., Lang, A.: ‘Digital audio forensics: a first practical evaluation on microphone and environment classification’. Ninth ACM Multimedia and Security Workshop, Dallas, TX, USA, 2007, pp. 63–74.
19. 19)
  - 19. Reynolds, D.A.: ‘HTIMIT and LLHDB: speech corpora for the study of handset transducer effects’. Proc. 1997 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Munich, Germany, 1997, vol. 2, pp. 1535–1538.
20. 20)
  - 30. Zolotarev, V.: ‘One dimensional stable distributions’ (Translations of Mathematical Monographs, American Mathematical Society, Providence, RI, USA, 1986), vol. 65.
21. 21)
  - 20. Panagakis, Y., Kotropoulos, C.: ‘Automatic telephone handset identification by sparse representation of random spectral features’. Proc. 14th ACM Multimedia and Security Workshop, Coventry, UK, 2012, pp. 91–95.
22. 22)
  - 2. Garcia-Romero, D., Espy-Wilson, C.Y.: ‘Automatic acquisition device identification from speech recordings’. Proc. 2010 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Dallas, TX, USA, 2010, pp. 1806–1809.
23. 23)
  - E.J. Candes , T. Tao . Decoding by linear programming. IEEE Trans. Inf. Theory , 12 , 4203 - 4215
24. 24)
  - I. Guyon , J. Makhoul , R. Schwartz , V. Vapnik . What size test set gives good error rate estimates?. IEEE Trans. Pattern Anal. Mach. Intell. , 1 , 52 - 64
25. 25)
  - 34. Otero, D., Arce, G.R.: ‘Generalized restricted isometry property for alpha-stable random projections’. Proc. 2011 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Prague, The Czech Republic, 2011, pp. 3676–3679.
26. 26)
  - 37. He, R., Zheng, W.-S., Tan, T., Sun, Z.: ‘Half-quadratic based iterative minimization for robust sparse representation’, IEEE Trans. Pattern Anal. Mach. Intell., 2014, 36, (2), pp. 261–275 (doi: 10.1109/TPAMI.2013.102).
27. 27)
  - 31. Arce, G.R.: ‘Nonlinear signal processing’ (John Wiley & Sons, Hoboken, NJ, USA, 2005).
28. 28)
  - 8. Jenner, F., Kwasinski, A.: ‘Highly accurate non-intrusive speech forensics for codec identifications from observed decoded signals’. Proc. 2012 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012, pp. 1737–1740.
29. 29)
  - 9. Sharma, D., Naylor, P.A., Gaubitch, N.D., Brookes, M.: ‘Non intrusive codec identification algorithm’. Proc. 2012 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012, pp. 4477–4480.
30. 30)
  - 10. Oermann, A., Lang, A., Dittmann, J.: ‘Verifier-tuple for audio-forensic to determine speaker environment’. Proc. Seventh ACM Multimedia and Security Workshop, New York, NY, USA, 2005, pp. 57–62.
31. 31)
  - 27. Vapnik, V.: ‘Statistical learning theory’ (John Wiley & Sons, New York, NY, USA, 1998).
32. 32)
  - 35. Donoho, D.: ‘For most large underdetermined systems of equations, the minimal ℓ1-norm near-solution approximates the sparsest near-solution’, Commun. Pure Appl. Math., 2006, 59, (7), pp. 907–934 (doi: 10.1002/cpa.20131).
33. 33)
  - 21. Panagakis, Y., Kotropoulos, C.: ‘Telephone handset identification by feature selection and sparse representations’. Proc. 2012 IEEE Int. Workshop Information Forensics and Security, Tenerife, Spain, 2012, pp. 73–78.
34. 34)
  - 15. Zhao, H., Malik, H.: ‘Acoustic recording location identification using acoustic environment signature’, IEEE Trans. Inf. Forensics Sec., 2013, 8, (11), pp. 1746–1759 (doi: 10.1109/TIFS.2013.2278843).
35. 35)
  - 23. Indyk, P.: ‘Stable distributions, pseudorandom generators, embeddings, and data stream computation’, J. ACM, 2006, 53, (3), pp. 307–323 (doi: 10.1145/1147954.1147955).
36. 36)
  - 6. Luo, D., Luo, W., Yang, R., Huang, J.: ‘Compression history identification for digital audio signal’. Proc. 2012 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012, pp. 1733–1736.
37. 37)
  - J. Wright , A.Y. Yang , A. Ganesh , S.S. Sastry , Y. Ma . Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. , 210 - 227
38. 38)
  - 29. Garofolo, J.: ‘Getting started with the DARPA TIMIT cd-rom: an acoustic phonetic continuous speech database’. Technical Report, National Institute Standards and Technology (NIST), 1988.

Source phone identification using sketches of features

References

Related content