A Review on Speech Emotion Recognition Using Machine Learning
DOI:
https://doi.org/10.55524/Keywords:
Speech Emotion Recognition, Machine Learning, HCI, SER, MFCCAbstract
This paper focuses on the development of a robust speech emotion recognition system using a combination of different speech features with feature optimization techniques and speech de-noising technique to acquire improved emotion classification accuracy, decreasing the system complexity and obtain noise robustness. Additionally, we create original methods for SER to merge features. We employ feature optimization methods that are based on the feature transformation and feature selection machine learning techniques in order to build SER. The following is a list of the upcoming events. A neural network can use either of these two techniques. As more feelings are taken into account, the feature fusion-acquired SER accuracy falls short of expectations, and the plague of dimensionality starts to spread due to the addition of speech features, which makes the SER system work harder to complete its task. This is due to the SER system becoming more complicated when voice elements are added. Therefore, it is crucial to create a SER system that is more trustworthy, has the most practical features, and uses the least amount of computing power possible. By using strategies that maximize current features, it is possible to streamline the feature selection process by reducing the total number of accessible choices to a more reasonable level. This piece employs a method known as Semi-Non Negative Matrix Factorization to lessen the amount of processing trash that the SER system generates. (Semi-NMF). This approach can be used to change traits that are capable of learning on their own.
Downloads
References
J. Nicholson, K. Takahashi, and R. Nakatsu, “Emotion recognition in speech using neural networks,” Neural Comput. Appl., 2000.
R. Banse and K. R. Scherer, “Acoustic Profiles in Vocal Emotion Expression,” J. Pers. Soc. Psychol., 1996. [3] M. J. Kim, J. Yoo, Y. Kim, and H. Kim, “Speech emotion classification using treestructured sparse logistic regression,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2015.
S. Lukose and S. S. Upadhya, “Music player based on emotion recognition of voice signals,” in 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies, ICICICT 2017, 2018
S. Ramakrishnan, “Recognition of Emotion from Speech: A Review,” in Speech Enhancement, Modeling and Recognition- Algorithms and Applications, 2012.
D. A. Cairns and H. L. John Hansen, “Nonlinear analysis and classification of speech under stressed conditions,” J. Acoust. Soc. Am., 1994.
S. Wu, T. H. Falk, and W.-Y. Chan, “Automatic speech emotion recognition using modulation spectral features,” Speech Commun., 2011
Y. Sun, G. Wen, and J. Wang, “Weighted spectral features based on local Hu moments for speech emotion recognition,” Biomed. Signal Process. Control, 2015.
PrasaduPeddi (2019), Data Pull out and facts unearthing in biological Databases, International Journal of Techno Engineering, Vol. 11, issue 1, pp: 25-32
Z. W. Huang, W. T. Xue, and Q. R. Mao, “Speech emotion recognition with unsupervised feature learning,” Front. Inf. Technol. Electron. Eng., 2015.
X. Zhao, S. Zhang, and B. Lei, “Robust emotion recognition in noisy speech via sparse representation,” Neural Comput. Appl., 2014.
PrasaduPeddi (2018), “A Study For Big Data Using Disseminated Fuzzy Decision Trees”, ISSN: 2366- 1313, Vol 3, issue 2, pp:46-57.
C. Busso et al., “IEMOCAP: Interactive emotional dyadic motion capture database,” Lang. Resour. Eval., 2008. [14] X. Valero and F. Alias, “Gammatonecepstral coefficients:
Biologically inspired features for non-speech audio classification,” IEEE Trans. Multimed., 2012.