Bangla Speech Recognition: Power Spectral Analysis, LPC & MFCC as Feature Extraction Techniques in Deep Learning
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
International Journal of Engineering Trends and Technology
Abstract
Speech recognition technology has already become a part of our everyday lives, and many works have been done
mostly in the English language because it is an international language, but there is still more that researchers could do. Speech
recognition technology has already become a part of the daily life. As can be seen, AI robots can converse with people,
particularly in English. The topic of this study is speech recognition in Bangla (Bengali). To determine the highest feasible speech
recognition accuracy in the Bangla (Bengali) language, several methods have been employed for pattern recognition and deep
learning. Native speakers of Bangla provided the core dataset. It includes extensive experiments with Bangla phonemes, isolated
words, commands, and sentences. Speech samples are subjected to feature extraction using MFCC. Simultaneously, LPC and
FFT are employed. Using the maximum-likelihood approach, a multilayer feedforward deep neural network model has been
utilized. A random dataset has been used to assess the model’s accuracy in speech recognition. Deep learning using a neural
network model and feature extraction using MFCC outperform Power spectral testing and linear predictor coefficient tests
regarding recognition outcomes. The investigation found that increasing the number of speech samples affected the recognition
accuracy rate, as did the speech samples from the opposing gender
Description
Citation
Chowdhury, Md Shafiul Alam, et al. "Bangla Speech Recognition: Power Spectral Analysis, LPC & MFCC as Feature Extraction Techniques in Deep Learning."
