2 results for Abdulla, Waleed H

  • The concepts of hidden Markov model in speech recognition

    Abdulla, Waleed H; Kasabov, Nikola (1999-05)

    Working or discussion paper
    University of Otago

    The speech recognition field is one of the most challenging fields that has faced scientists for a long time. The complete solution is still far from reach. The efforts are concentrated with huge funds from the companies to different related and supportive approaches to reach the final goal. Then, apply it to the enormous applications that are still waiting for the successful speech recognisers that are free from the constraints of speakers, vocabularies or environment. This task is not an easy one due to the interdisciplinary nature of the problem and as it requires speech perception to be implied in the recogniser (Speech Understanding Systems) which in turn point strongly to the use of intelligence within the systems. The bare techniques of recognisers (without intelligence) are following wide varieties of approaches with different claims of success by each group of authors who put their faith in their favourite way. However, the sole technique that gains the acceptance of the researchers to be the state of the art is the Hidden Markov Model (HMM) technique. HMM is agreed to be the most promising one. It might be used successfully with other techniques to improve the performance, such as hybridising the HMM with Artificial Neural Networks (ANN) algorithms. This does not mean that the HMM is pure from approximations that are far from reality, such as the successive observations independence, but the results and potential of this algorithm is reliable. The modifications on HMM take the burden of releasing it from these poorly representative approximations hoping for better results. In this report we are going to describe the backbone of the HMM technique with the main outlines for successful implementation. The representation and implementation of HMM varies in one way or another but the main idea is the same as well as the results and computation costs, it is a matter of preferences to choose one. Our preference here is that adopted by Ferguson and Rabiner et al. In this report we will describe the Markov Chain, and then investigate a very popular model in the speech recognition field (the Left-Right HMM Topology). The mathematical formulations needed to be implemented will be fully explained as they are crucial in building the HMM. The prominent factors in the design will also be discussed. Finally we conclude this report by some experimental results to see the practical outcomes of the implemented model.

    View record details
  • Signal processing and acoustic modelling of speech signals for speech recognition systems

    Abdulla, Waleed H (2002-03)

    Thesis
    University of Otago

    Natural man-machine interaction is currently one of the most unfulfilled pledges of automatic speech recognition (ASR). The purpose of an automatic speech recognition system is to accurately transcribe or execute what has been said. State-of-the-art speech recognition systems consist of four basic modules: the signal processing, the acoustic modelling, the language modelling, and the search engine. The subject of this thesis is the signal processing and acoustic modelling modules. We pursue the modelling of spoken signals in an optimum way. The resultant modules can be used successfully for the subsequent two modules. Since the first order hidden Markov model (HMM) has been a tremendously successful mathematically established paradigm, which makes it the up-to-the-minute technique in current speech recognition systems, this dissertation bases all its studies and experiments on HMM. HMM is a statistical framework that supports both acoustic and temporal modelling. It is widely used despite making a number of suboptimal modelling assumptions, which put limits on its full potential. We investigate how the model design strategy and the algorithms can be adapted to HMMs. Large suites of experimental results are demonstrated to expound the relative effectiveness of each component within the HMM paradigm. This dissertation presents several strategies for improving the overall performance of baseline speech recognition systems. The implementation of these strategies was optimised in a series of experiments. We also investigate selecting the optimal feature sets for speech recognition improvement. Moreover, the reliability of human speech recognition is attributed to the specific properties of the auditory presentation of speech. Thus, in this dissertation, we explore the use of perceptually inspired signal processing strategies, such as critical band frequency analysis. The resulting speech representation called Gammatone cepstral coefficients (GTCC) provides relative improvement over the baseline recogniser. We also investigate multiple signal representations for recognition in an ASR to improve the recognition rate. Additionally, we developed fast techniques that are useful for evaluation and comparison procedures between different signal processing paradigms. The following list gives the main contributions of this dissertation: • Speech/background discrimination. • HMM initialisation techniques. • Multiple signal representation with multi-stream paradigms. • Gender based modelling. • Feature vectors dimensionality reduction. • Perceptually motivated feature sets. • ASR training and recognition packages for research and development. Many of these methods can be applied in practical applications. The proposed techniques can be used directly in more complicated speech recognition systems by introducing their resultants to the language and search engine modules.

    View record details