This page describes how we build an emotion classifier for the OpenVokaturi libraries.

The classifier in OpenVokaturi starts by measuring the 9 cues. These cues are known to be related to emotion classes. For instance, Kienast & Sendlmeier (2000), in their Figs. 8 through 12, notice that in Emo-DB the “spectral balance” is greater than average for Anger, Happiness and Fear, and less than average for Boredom and Sadness.

We compute the means and standard deviations of all five emotion classes along all nine features:

/*
* Stats9.c
*
* Copyright (C) 2016,2017 Paul Boersma, Johnny Ip, Toni Gojani
* version 2017-01-03
*
* This code is part of OpenVokaturi.
*
* OpenVokaturi is free software; you can redistribute it and/or modify
* the Free Software Foundation; either version 3 of the License, or (at
* your option) any later version.
*
* OpenVokaturi is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
* See the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this software. If not, see http://www.gnu.org/licenses/.
*/

#include <math.h>
#include "Dataset9.h"

static Dataset9 files = { 0 };

int main (int argc, const char * argv[]) {
Dataset9_print (& files, stderr);
double mean [NUMBER_OF_CUES9] = { 0.0 }, stdev [NUMBER_OF_CUES9] = { 0.0 };
for (int ifile = 0; ifile < files. count; ifile ++) {
Datum9 *datum = & files. data [ifile];
for (int cue = 0; cue < NUMBER_OF_CUES9; cue ++) {
double value = datum -> cueStrengths [cue];
mean [cue] += value;
stdev [cue] += value * value;
}
}
for (int cue = 0; cue < NUMBER_OF_CUES9; cue ++) {
mean [cue] /= files.count;
stdev [cue] = sqrt ((stdev [cue] - files.count * mean [cue] * mean [cue]) / files.count);
fprintf (stderr, "Cue %d: mean = %f, stdev = %f\n", cue, mean [cue], stdev [cue]);
}
Dataset9_clear (& files);
}

/* End of file Stats9.c */

The result becomes the constants Cues9_mean[] and Cues9_stdev[] in Cues9.h.

Given the 9 cues of a recording, we compute the emotion probabilities via a neural network with three levels of linear connections. The two hidden layers of nodes consist of rectifying units. The network was trained on EmoDB and Savee (see Train9.c); for the resulting parameters, see Network9-100-20.h.

The input to the network consists of nine nodes that contain the strengths of the nine cues, converted into something similar to $$z$$ values. For this transformation we subtract the means computed above, and then divide by the standard deviations computed above (this is the same transformation that was used in training, where it speeds up learning appreciably).

Information then proceeds toward to the first layer of hidden nodes. Each of the 100 nodes has a bias, as well as a weight to each of the nine input nodes.

Information then proceeds toward to the second layer of hidden nodes. Each of the 20 nodes has a bias, as well as a weight to each of the 100 lower nodes.

Information then proceeds toward to the output layer, which contains five nodes, i.e. one node for each emotion class. Each of the 5 nodes has a bias, as well as a weight to each of the 20 lower nodes.

To turn the output activities into probabilities we perform a softmax transformation: the probability of a class is proportional to its exponentiated output value. Finally, we weigh the probabilities by the relative prior probabilities that you specified by calling VokaturiVoce_setRelativePriorProbabilities() (if you did not call this function, all emotions weigh equally, i.e. they have an equal prior probability of occurring).

You can follow these steps by looking into Vokaturi9.c.