Home
>
Journals & magazines
>
IEE Proceedings I (Communications, Speech and Vis...
>
Volume 136
Issue 2
IEE Proceedings I (Communications, Speech and Vision)
Volume 136, Issue 2, April 1989
Volumes & issues:
Volume 136, Issue 2
April 1989
-
- Author(s): M.A. Jack ; J. Laver ; J. Blauert
- Source: IEE Proceedings I (Communications, Speech and Vision), Volume 136, Issue 2, page: 109 –109
- DOI: 10.1049/ip-i-2.1989.0013
- Type: Article
- + Show details - Hide details
-
p.
109
(1)
- Author(s): L.C. Wood and D.J.B. Pearce
- Source: IEE Proceedings I (Communications, Speech and Vision), Volume 136, Issue 2, p. 110 –118
- DOI: 10.1049/ip-i-2.1989.0014
- Type: Article
- + Show details - Hide details
-
p.
110
–118
(9)
Speech signals can be efficiently parametrised by the resonant frequencies of the vocal tract known as formants. The automatic analysis of the signal into a suitable set of formant parameters has however proved to be a difficult problem, particularly for female speech. The technique of excitation synchronous formant analysis has been proposed as an improved method of formant analysis [5]. The paper considers the performance of this technique, particularly where the analysis interval is over the closed phase of the larynx. The improved performance of closed-phase formant analysis is demonstrated by comparison with pitch-synchronous and fixed-frame formant analysis. The closed-phase region is determined first using a laryngograph signal and secondly using a modified form of the Gold-Rabiner fundamental-frequency estimator, using only the acoustic waveform. The improved performance of closed-phase formant analysis is also demonstrated by a better ability to follow the transient features of the signal, with fewer missed or extra formants, and better formant continuity. The ability to follow formant transitions during glides (e.g. w, r, l) and in voiced segments following plosives is particularly apparent. These improvements are illustrated in various phonetic contexts. The technique has been tested for sensitivity to analysis position, which is important when the glottal closures are determined from the acoustic waveform. This method of formant analysis is currently being applied to the development of speech synthesis by rule, and to provide a set of features for phonetic recognition. - Author(s): F.R. McInnes ; M.A. Jack ; J. Laver
- Source: IEE Proceedings I (Communications, Speech and Vision), Volume 136, Issue 2, p. 119 –126
- DOI: 10.1049/ip-i-2.1989.0015
- Type: Article
- + Show details - Hide details
-
p.
119
–126
(8)
A template-based isolated word-recognition system, with adaptation of templates by weighted averaging with recognised input utterances, is described. Experiments with adaptation of speaker-specific and speaker-independent templates are reported. The results show substantial improvements in the recognition accuracies attained, and reveal the importance of applying compensation technique which adjusts the word distances obtained according to the amounts of adaptation applied. Aspects of interaction between the system and the user are discussed. - Author(s): C. Vicenzi ; C. Favareto ; A. Carossino ; A.M. Colla ; C. Scagliola ; P. Pedrazzi
- Source: IEE Proceedings I (Communications, Speech and Vision), Volume 136, Issue 2, p. 127 –132
- DOI: 10.1049/ip-i-2.1989.0016
- Type: Article
- + Show details - Hide details
-
p.
127
–132
(6)
In the paper, a large vocabulary realtime isolated word recognition system (DSPELL) is presented. Although the final goal of the project is to recognise words from a vocabulary whose size is of the order of 10–20 thousand, the present system is intended to perform real-time recognition on vocabulary subsets (cohorts) of up to 2 thousand words. The system is implemented on Elsag's multiprocessor EMMA† for a fast response. Basic features of the system are the use of subword units (diphones) for the acoustic measurements, and the derivation of synthetic symbolic word templates directly from the lexicon. The use of diphones makes the training session of DSPELL very convenient. - Author(s): Y. Ariki ; S. Mizuta ; M. Nagata ; T. Sakai
- Source: IEE Proceedings I (Communications, Speech and Vision), Volume 136, Issue 2, p. 133 –140
- DOI: 10.1049/ip-i-2.1989.0017
- Type: Article
- + Show details - Hide details
-
p.
133
–140
(8)
In the paper, two-dimensional cepstrum (TDC) analysis and its application to word and monosyllable recognition are described. The TDC can simultaneously represent several different kinds of information contained in the speech waveform: static and dynamic features, as well as global and fine frequency structure. Noise reduction and speech enhancement can be easily performed using the TDC. Using word and monosyllable recognition experiments based on dynamic programming (DP) matching of a time sequence of the TDC, it is confirmed that the global static features (spectral envelope) and global dynamic features are both effective for speech recognition. A speaker-independent (noisy) word recognition algorithm is also proposed which recognises the words based on the similarity of dynamic features. The algorithm employs linear matching instead of DP nonlinear matching, requires a small amount of memory, and shows high speed and high accuracy in recognition. At present, the recognition rate is 89.0% at ∞ dB and 70.0% at 0 dB signal-to-noise ratio. - Author(s): I.M. Trancoso and J.M. Tribolet
- Source: IEE Proceedings I (Communications, Speech and Vision), Volume 136, Issue 2, p. 141 –144
- DOI: 10.1049/ip-i-2.1989.0018
- Type: Article
- + Show details - Hide details
-
p.
141
–144
(4)
High quality speech coding at medium-to-low bit rates is presently one of the major goals in speech research. Stochatic coding represents an important step towards this objective. Yet, the quality of synthetic speech is still not always good enough. A subjectively important part of the distortion may arise from imperfect reproduction of voiced regions, where the harmonic structure is not as well marked in the synthetic signal as it is in the original speech signal. Postprocessing of synthetic signals using harmonic modelling arises as a natural solution to reduce this distortion. The disadvantages of this method in terms of additional delay, complexity and dependency on high precision pitch detectors can be well counterbalanced by the higher quality of resynthesised speech signals in voiced regions. - Author(s): G. Mercier ; D. Bigorgne ; L. Miclet ; L. le Guennec ; M. Querre
- Source: IEE Proceedings I (Communications, Speech and Vision), Volume 136, Issue 2, p. 145 –154
- DOI: 10.1049/ip-i-2.1989.0019
- Type: Article
- + Show details - Hide details
-
p.
145
–154
(10)
A description of the speaker-dependent continuous speech recognition system KEAL is given. An unknown utterance is recognised by means of the following procedures: acoustic analysis, phonetic segmentation and identification, word and sentence analysis. The combination of feature-based, speaker-independent coarse phonetic segmentation with speaker-dependent statistical classification techniques is one of the main design features of the acoustic-phonetic decoder. The lexical access component is essentially based on a statistical dynamic programming technique which aims at matching a phonemic lexical entry containing various phonological forms, against a phonetic lattice. Sentence recognition is achieved by use of a context-free grammar and a parsing algorithm derived from Earley's parser. A speaker adaptation module allows some of the system parameters to be adjusted by matching known utterances with their acoustical representation. The task to be performed, described by its vocabulary and its grammar, is given as a parameter of the system. Continuously spoken sentences extracted from a ‘pseudo-Logo’ language are analysed and results are presented. - Author(s): W. Drews ; R. Laroia ; J. Pandel ; A. Schumacher ; A. Stölzle
- Source: IEE Proceedings I (Communications, Speech and Vision), Volume 136, Issue 2, p. 155 –161
- DOI: 10.1049/ip-i-2.1989.0020
- Type: Article
- + Show details - Hide details
-
p.
155
–161
(7)
A special purpose CMOS signal processor for use in a speaker-dependent isolated word-recognition system is described. The recognition is performed on the basis of pattern matching, i.e. the processor calculates the distances between the spoken word to be recognised and all the reference words stored in a template memory by means of the dynamic-time-warp algorithm. Operating at 10 MHz, the processor performs recognition for a vocabulary of up to 1000 words. - Author(s): J.J. Mariani
- Source: IEE Proceedings I (Communications, Speech and Vision), Volume 136, Issue 2, p. 162 –166
- DOI: 10.1049/ip-i-2.1989.0021
- Type: Article
- + Show details - Hide details
-
p.
162
–166
(5)
This project integrates different parts of a speaker-dependent, isolated word voice-activated typewriter on a personal computer (IBM PC-AT).To build up the language model (for French), several routines have been written: automatic grapheme to phoneme conversion, semiautomatic training texts (20 pages) processing (building up the Graphemic (2500 words) and Phonemic (2000 words) lexicons), syntactic labelling through inductive inference, computation of the probabilistic language model (bigrams and trigrams on grammatical classes), and the definition of the phonological rules.The speech signal is analysed by 20 digital bandpass filters. Several types of speech compression techniques have been tried on medium and large difficulty vocabularies. Vector quantisation and nonlinear time compression have been chosen.Recognition is conducted in three steps:(a) fast match based on word length and gross comparison(b) detailed match based on conventional DTW algorithms(c) Use of the language model to take into account the linguistic constraints, and to achieve the phoneme to grapheme conversion.Overall recognition rates of 95% have been obtained with a mean recognition time of 2s, the 2000 templates being stored on 60 KBytes of RAM memory. Recognition results with or without the language model have been compared.
Editorial. Speech technology
Excitation synchronous formant analysis
Template adaptation in an isolated word-recognition system
Large vocabulary isolated word recognition: a real-time implementation
Spoken-word recognition using dynamic features analysed by two-dimensional cepstrum
Harmonic postprocessing off speech synthesised by stochastic coders
Recognition of speaker-dependent continuous speech with KEAL
CMOS processor for template-based speech-recognition system
Hamlet: a prototype of a voice-activated typewriter
-
- Author(s): A.M. Rosie
- Source: IEE Proceedings I (Communications, Speech and Vision), Volume 136, Issue 2, page: 168 –168
- DOI: 10.1049/ip-i-2.1989.0022
- Type: Article
- + Show details - Hide details
-
p.
168
(1)
- Author(s): G. Brand ; V. Madisetti ; D.G. Messerschmitt
- Source: IEE Proceedings I (Communications, Speech and Vision), Volume 136, Issue 2, p. 169 –174
- DOI: 10.1049/ip-i-2.1989.0023
- Type: Article
- + Show details - Hide details
-
p.
169
–174
(6)
The paper presents some results on the range performance of multilevel signalling schemes in the presence of near-end crosstalk (NEXT) in the digital subscriber loop. With a data rate of 160 kbit/s at the U interface, multilevel line codes can be used to reduce the symbol rate and hence the bandwidth of the transmitted signal to yield improved performance in the crosstalk dominated environment, as this impairment has a power transfer which increases with frequency. By varying the number of transmitted signal levels over the range 2 to 128, subject to a peak power constraint, we obtain results for the range of the optimum linear and decision feedback equalisers for a probability of error of 10-6. These results indicate that the optimum number of levels lies between 4 and 7. In addition, the optimum 3-level PAM is compared with some commonly used 3-level line codes. It is found that the performance degradation of these suboptimal codes is less than 10%. - Author(s): P. Mathiopoulos ; H. Ohnishi ; K. Feher
- Source: IEE Proceedings I (Communications, Speech and Vision), Volume 136, Issue 2, p. 175 –179
- DOI: 10.1049/ip-i-2.1989.0024
- Type: Article
- + Show details - Hide details
-
p.
175
–179
(5)
For the transmission of a standard CEPT, i.e. 2.048 Mbit/s rate digital transmission, in a 240 kHz supergroup band, a 1024-QAM (quadrature amplitude modulated) system employing raised cosine filters with an excess bandwidth of 17% is required. In the paper, the performance of a 1024-QAM system in the presence of different filter imperfections is studied by means of computer simulation. To assess the effects of different channel imperfections, such as filter distortion and/or selective fades caused by residual distortion which are not cancelled by adaptive equalisers, computer simulated results are given, presenting the degradation of the system performance to linear, parabolic, sinusoidal amplitude and group delay distortions. It is shown that sinusoidal amplitude and linear group delay cause the most significant performance degradations. For comparison purposes we also include the results of a same baud rate 512-QAM system.
Book review: Mathematical Foundations for Communication Engineering
Multilevel range/NEXT performance in digital subscriber loops
Study of 1024-QAM system performance in the presence of filtering imperfections
Most viewed content for this Journal
Article
content/journals/ip-i-2
Journal
5
Most cited content for this Journal
We currently have no most cited data available for this content.