Bangla Speech Recognition: Power Spectral Analysis, LPC & MFCC as Feature Extraction Techniques in Deep Learning

Abstract

Speech recognition technology has already become a part of our everyday lives, and many works have been done mostly in the English language because it is an international language, but there is still more that researchers could do. Speech recognition technology has already become a part of the daily life. As can be seen, AI robots can converse with people, particularly in English. The topic of this study is speech recognition in Bangla (Bengali). To determine the highest feasible speech recognition accuracy in the Bangla (Bengali) language, several methods have been employed for pattern recognition and deep learning. Native speakers of Bangla provided the core dataset. It includes extensive experiments with Bangla phonemes, isolated words, commands, and sentences. Speech samples are subjected to feature extraction using MFCC. Simultaneously, LPC and FFT are employed. Using the maximum-likelihood approach, a multilayer feedforward deep neural network model has been utilized. A random dataset has been used to assess the model’s accuracy in speech recognition. Deep learning using a neural network model and feature extraction using MFCC outperform Power spectral testing and linear predictor coefficient tests regarding recognition outcomes. The investigation found that increasing the number of speech samples affected the recognition accuracy rate, as did the speech samples from the opposing gender

Description

Citation

Chowdhury, Md Shafiul Alam, et al. "Bangla Speech Recognition: Power Spectral Analysis, LPC & MFCC as Feature Extraction Techniques in Deep Learning."

Collections

Endorsement

Review

Supplemented By

Referenced By