Bangla Speech Processing: Time Delay Neural Networks Enhanced by Advanced Algorithms
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Mathematical Modelling of Engineering Problems
Abstract
This study explores critical challenges in Bangla speech recognition by evaluating phoneme, word, command, and sentence-level recognition using a MATLAB-based framework. The feature extraction methods Mel-Frequency Cepstral Coefficients (MFCC), Power Spectral Analysis, and Linear Predictive Coding (LPC) are applied with Blackman, Hamming, and Hanning windowing techniques. Time Delay Neural Network (TDNN) models are trained using three optimization algorithms: Scaled Conjugate Gradient Algorithm (SCGA), Levenberg–Marquardt Algorithm (LMA), and Bayesian Regularization Algorithm (BRA). Results indicate that MFCC combined with TDNN, optimized via LMA, BRA, or SCGA, yields the highest recognition accuracy, reaching up to 94%. Six experiments are analyzed, including five from existing literature and one representing the current study. Comparative evaluation and statistical analysis, including confidence intervals, are employed to identify the most effective configuration. The findings outperform previous approaches and underscore the influence of sample size, speaker gender, and windowing methods on recognition performance. These insights offer a foundation for future improvements in Bangla speech technology.
Description
Citation
Chowdhury, Md Shafiul Alam, et al. "Bangla Speech Processing: Time Delay Neural Networks Enhanced by Advanced Algorithms." Mathematical Modelling of Engineering Problems 12.9 (2025).
