Machine learning approach for emotion recognition in speech
Journal
Informatica
Date Issued
2014
Author(s)
Gjoreski, Martin
Gjoreski, Hristijan
Abstract
This paper presents a machine learning approach to automatic recognition of human emotions from
speech. The approach consists of three steps. First, numerical features are extracted from the sound
database by using audio feature extractor. Then, feature selection method is used to select the most
relevant features. Finally, a machine learning model is trained to recognize seven universal emotions:
anger, fear, sadness, happiness, boredom, disgust and neutral. A thorough ML experimental analysis is
performed for each step. The results showed that 300 (out of 1582) features, as ranked by the gain ratio,
are sufficient for achieving 86% accuracy when evaluated with 10 fold cross-validation. SVM achieved
the highest accuracy when compared to KNN and Naive Bayes. We additionally compared the accuracy
of the standard SVM (with default parameters) and the one enhanced by Auto-WEKA (optimized
algorithm parameters) using the leave-one-speaker-out technique. The results showed that the SVM
enhanced with Auto-WEKA achieved significantly better accuracy than the standard SVM, i.e., 73% and
77% respectively. Finally, the results achieved with the 10 fold cross-validation are comparable and
similar to the ones achieved by a human, i.e., 86% accuracy in both cases. Even more, low energy
emotions (boredom, sadness and disgust) are better recognized by our machine learning approach compared to the human.
speech. The approach consists of three steps. First, numerical features are extracted from the sound
database by using audio feature extractor. Then, feature selection method is used to select the most
relevant features. Finally, a machine learning model is trained to recognize seven universal emotions:
anger, fear, sadness, happiness, boredom, disgust and neutral. A thorough ML experimental analysis is
performed for each step. The results showed that 300 (out of 1582) features, as ranked by the gain ratio,
are sufficient for achieving 86% accuracy when evaluated with 10 fold cross-validation. SVM achieved
the highest accuracy when compared to KNN and Naive Bayes. We additionally compared the accuracy
of the standard SVM (with default parameters) and the one enhanced by Auto-WEKA (optimized
algorithm parameters) using the leave-one-speaker-out technique. The results showed that the SVM
enhanced with Auto-WEKA achieved significantly better accuracy than the standard SVM, i.e., 73% and
77% respectively. Finally, the results achieved with the 10 fold cross-validation are comparable and
similar to the ones achieved by a human, i.e., 86% accuracy in both cases. Even more, low energy
emotions (boredom, sadness and disgust) are better recognized by our machine learning approach compared to the human.
Subjects
File(s)![Thumbnail Image]()
Loading...
Name
719-722-1-PB.pdf
Size
152.54 KB
Format
Adobe PDF
Checksum
(MD5):abda7f95fc0c91a86904c229803ed17f
