access icon free Perceptual orthogonal matching pursuit for speech sparse modelling

The perceptual orthogonal matching pursuit (POMP), a sparse approximation algorithm built upon the known orthogonal matching pursuit (OMP), is introduced. It is designed for speech processing and can be of great use in speech coding applications. It can handle all types of real dictionaries, including predefined and adaptive dictionaries. Being a suboptimal method, POMP performs a series of local updates where it minimises a perceptual distortion measure involving a perceptual weighting filter. This filter is tailored for speech signals and is used in AMR 3GPP coders. Experiments show that POMP outperforms the standard OMP for predefined and adaptive dictionaries.

Inspec keywords: speech coding; filtering theory; iterative methods; approximation theory

Other keywords: perceptual weighting filter; POMP; perceptual orthogonal matching pursuit; perceptual distortion measure; suboptimal method; sparse approximation algorithm; speech processing; local updates; speech sparse modelling; adaptive dictionaries; speech signals; predefined dictionaries; AMR 3GPP coders; speech coding applications

Subjects: Filtering methods in signal processing; Speech processing techniques; Interpolation and function approximation (numerical analysis); Interpolation and function approximation (numerical analysis); Speech and audio coding

References

    1. 1)
      • 10. Vinuesa, N.: ‘Internship report’. Signal and Image Processing, 2013.
    2. 2)
      • 5. Heusdens, R., Vafin, R., Bastiaan Kleijn, W.: ‘Sinusoidal modeling of audio and speech using psychoacoustic-adaptive matching pursuits’. ICASSP, Salt Lake City, UT, USA, May 2001, pp. 32813284.
    3. 3)
      • 7. Najaf-zadeh, H., Pichevar, R., Lahdili, H., et al: ‘Perceptual matching pursuit for audio coding’. Audio Engineering Society Convention, Amsterdam, Netherlands, May 2008.
    4. 4)
    5. 5)
      • 6. Sturm, B.L., Christensen, M.G.: ‘Cyclic matching pursuits with multiscale time–frequency dictionaries’. Asilomar Conf. on Signals, Systems and Computers, Pacific Grove, CA, USA, November 2010, pp. 581585.
    6. 6)
      • 8. Schroeder, M.R., Atal, B.S.: ‘Code excited linear prediction (CELP): high-quality speech at very low bit rates’. ICASSP, Tampa, FL, USA, April 1982, pp. 614617.
    7. 7)
      • 9. 3GPP TS 26.190, AMR wideband speech codec; transcoding functions (release 4)’, 3GPP, 2000.
    8. 8)
      • 2. Mallat, S., Zhang, Z.: ‘Adaptive time frequency decomposition with matching pursuits’. IEEE SP Int. Symp. on Time Frequency and Time Scale Analysis, Victoria, BC, Canada, October 1992, pp. 710.
    9. 9)
    10. 10)
      • 4. Verma, T.S., Teresa, H.Y.M.: ‘Sinusoidal modeling using frame-based perceptually weighted matching pursuits’. ICASSP, Phoenix, AZ, USA, March 1999, pp. 981984.
http://iet.metastore.ingenta.com/content/journals/10.1049/el.2017.1608
Loading

Related content

content/journals/10.1049/el.2017.1608
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading