A Review on Speech Emotion Recognition Using Machine Learning


  • S k Mohammed Jubear Department of Computer Science and Engineering, PACE Institute of Technology & Sciences, Vallur, Ongole, Andhra Pradesh, India Author
  • D Pavan Kumar Reddy Department of Computer Science and Engineering, PACE Institute of Technology & Sciences, Vallur, Ongole, Andhra Pradesh, India Author
  • G Subramanyam Department of Computer Science and Engineering, PACE Institute of Technology & Sciences, Vallur, Ongole, Andhra Pradesh, India Author
  • S k Farooq Department of Computer Science and Engineering, PACE Institute of Technology & Sciences, Vallur, Ongole, Andhra Pradesh, India Author
  • T Sreenivasulu Department of Computer Science and Engineering, PACE Institute of Technology & Sciences, Vallur, Ongole, Andhra Pradesh, India Author
  • N SrinivasaRao Department of Computer Science and Engineering, PACE Institute of Technology & Sciences, Vallur, Ongole, Andhra Pradesh, India Author




Speech Emotion Recognition, Machine Learning, HCI, SER, MFCC


This paper focuses on the development  of a robust speech emotion recognition system using a  combination of different speech features with feature  optimization techniques and speech de-noising technique  to acquire improved emotion classification accuracy,  decreasing the system complexity and obtain noise  robustness. Additionally, we create original methods for  SER to merge features. We employ feature optimization methods that are based on the feature transformation and  feature selection machine learning techniques in order to  build SER. The following is a list of the upcoming events.  A neural network can use either of these two techniques.  As more feelings are taken into account, the feature  fusion-acquired SER accuracy falls short of expectations,  and the plague of dimensionality starts to spread due to  the addition of speech features, which makes the SER  system work harder to complete its task. This is due to the SER system becoming more complicated when voice  elements are added. Therefore, it is crucial to create a  SER system that is more trustworthy, has the most  practical features, and uses the least amount of computing  power possible. By using strategies that maximize current  features, it is possible to streamline the feature selection  process by reducing the total number of accessible  choices to a more reasonable level. This piece employs a  method known as Semi-Non Negative Matrix  Factorization to lessen the amount of processing trash that  the SER system generates. (Semi-NMF). This approach  can be used to change traits that are capable of learning  on their own. 


Download data is not yet available.


J. Nicholson, K. Takahashi, and R. Nakatsu, “Emotion recognition in speech using neural networks,” Neural Comput. Appl., 2000.

R. Banse and K. R. Scherer, “Acoustic Profiles in Vocal Emotion Expression,” J. Pers. Soc. Psychol., 1996. [3] M. J. Kim, J. Yoo, Y. Kim, and H. Kim, “Speech emotion classification using treestructured sparse logistic regression,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2015.

S. Lukose and S. S. Upadhya, “Music player based on emotion recognition of voice signals,” in 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies, ICICICT 2017, 2018

S. Ramakrishnan, “Recognition of Emotion from Speech: A Review,” in Speech Enhancement, Modeling and Recognition- Algorithms and Applications, 2012.

D. A. Cairns and H. L. John Hansen, “Nonlinear analysis and classification of speech under stressed conditions,” J. Acoust. Soc. Am., 1994.

S. Wu, T. H. Falk, and W.-Y. Chan, “Automatic speech emotion recognition using modulation spectral features,” Speech Commun., 2011

Y. Sun, G. Wen, and J. Wang, “Weighted spectral features based on local Hu moments for speech emotion recognition,” Biomed. Signal Process. Control, 2015.

PrasaduPeddi (2019), Data Pull out and facts unearthing in biological Databases, International Journal of Techno Engineering, Vol. 11, issue 1, pp: 25-32

Z. W. Huang, W. T. Xue, and Q. R. Mao, “Speech emotion recognition with unsupervised feature learning,” Front. Inf. Technol. Electron. Eng., 2015.

X. Zhao, S. Zhang, and B. Lei, “Robust emotion recognition in noisy speech via sparse representation,” Neural Comput. Appl., 2014.

PrasaduPeddi (2018), “A Study For Big Data Using Disseminated Fuzzy Decision Trees”, ISSN: 2366- 1313, Vol 3, issue 2, pp:46-57.

C. Busso et al., “IEMOCAP: Interactive emotional dyadic motion capture database,” Lang. Resour. Eval., 2008. [14] X. Valero and F. Alias, “Gammatonecepstral coefficients:

Biologically inspired features for non-speech audio classification,” IEEE Trans. Multimed., 2012.




How to Cite

A Review on Speech Emotion Recognition Using Machine Learning . (2022). International Journal of Innovative Research in Computer Science & Technology, 10(3), 406-411. https://doi.org/10.55524/