This page explains how you can incorporate the Open Vokaturi software in your own C app (real-time version).

1. Using Vokaturi in a real-time setting

In the batch examples, there were precisely as many calls to VokaturiVoice_fill() as to VokaturiVoice_extract(). Now consider a real-time situation, where a recording callback gives you 512 samples every 12 milliseconds, and a timer thread wants to analyze the incoming samples every 100 milliseconds.

The samples have to go from the recording callback to the VokaturiVoice buffer via VokaturiVoice_fill(), but this cannot be done directly, because a call to VokaturiVoice_extract() might interrupt this process. Also, VokaturiVoice_extract() cannot inspect the buffer while VokaturiVoice_fill() is writing into it.

The solution is to use a shared buffer in your application. To fill this shared buffer from the recording thread, you apply a lock and send the 512 samples from the recording thread to the buffer. To use any newly accumulated samples from the shared buffer, the analysing thread applies the lock, sends samples to the VokaturiVoice with VokaturiVoice_fill(), releases the lock, and calls VokaturiVoice_extract(). Again, there are precisely as many calls to VokaturiVoice_fill() as to VokaturiVoice_extract(), but the fast function VokaturiVoice_fill() is protected by a lock, and the slow function VokaturiVoice_extract() is not.

#define SHARED_BUFFER_SIZE  220500
struct {
    double samples [SHARED_BUFFER_SIZE];   // so that's approximately 1.76 megabytes
    int64_t numberOfReceivedSamples;
    int64_t numberOfSentSamples;
} sharedBuffer;

VokaturiVoice ourVoice;   // initialize at start-up
Lock ourLock;
void recordingCallback (int numberOfSamples, int16_t samples) {
    lock (ourLock);
    int64_t sampleCount = sharedBuffer.numberOfReceivedSamples);
    int32_t samplePointer = sampleCount % SHARED_BUFFER_SIZE;
    for (int32_t i = 0; i < numberOfSamples; i ++) {
        if (samplePointer >= SHARED_BUFFER_SIZE)
            samplePointer -= SHARED_BUFFER_SIZE;
        sharedBuffer.samples [samplePointer] = (double) samples [i];
        samplePointer += 1;
    }
    unlock (ourLock);
}
void timerCallback () {
    lock (ourLock);
    if (sharedBuffer.numberOfReceivedSamples == 0) {
        unlock (ourLock);
        return;   // nothing recorded yet
    }
    if (sharedBuffer.numberOfReceivedSamples > sharedBuffer.numberOfSentSamples) {
        for (int64_t isamp = sharedBuffer.numberOfSentSamples;
             isamp < sharedBuffer.numberOfReceivedSamples;
             isamp ++)
        {
            int32_t indexOfSampleToBeSent = (int32_t) (isamp % SHARED_BUFFER_SIZE);
            VokaturiVoice_fill (ourVoice, 1,
                                & sharedBuffer.samples [indexOfSampleToBeSent]);
        }
        sharedBuffer.numberOfSentSamples = sharedBuffer.numberOfReceivedSamples;
    }
    unlock (ourLock);
    VokaturiQuality quality;
    VokaturiEmotionProbabilities emotionProbabilities;
    VokaturiVoice_extract (ourVoice, & quality, & emotionProbabilities);
    if (quality.valid) {
        printf ("%.6f %.6f %.6f %.6f %.6f\n",
            emotionProbabilities.neutrality,
            emotionProbabilities.happiness,
            emotionProbabilities.sadness,
            emotionProbabilities.anger,
            emotionProbabilities.fear);
    }
}

2. Real-time implementation in iOS

The example above did not specify how locks were implemented. The implementation of real-time behavior is typically platform-dependent. The files VokaMonoOpen-3-0-ios.zip and VokaStereoOpen-3-0-ios.zip contain complete demo apps called “VokaMono” and “VokaStereo”, respectively, which you can open directly with Xcode. Each of these projects contains a copy of the OpenVokaturi-3-0-ios.a library.

This section describes how Vokaturi can be used in real time on iOS, by explaining parts of the VokaMono demo app.

The shared buffer is declared in SharedBuffer.h:

/*
 * SharedBuffer.h
 * This software is released under the GNU General Public License. No warranty.
 *
 * Copyright (C) Paul Boersma 2016
 * version 2016-04-25
 */

#include <AudioToolbox/AudioToolbox.h>
#include <libkern/OSAtomic.h>

/*
    A buffer is shared between the render callback,
    which writes into it from the recording thread,
    and the analysis software, which reads from it in a timer callback in the GUI thread.

    The recording thread typically puts 512 samples into the buffer
    during the render callback every 1.2 milliseconds.
    After writing the 512 samples, the render callback has to update the sample pointer
    in an atomic manner.

    The GUI timer callback, which is typically called every 40 milliseconds,
    feeds any new samples to the analysis software and then performs an analysis;
    after this, the timer callback notifies the GUI.
    To know how many samples to feed, the time callback has to inspect
    the sample pointer in an atomic manner.
    
    With the atomic update and the atomic read, there is no need for locks.
    The samples that are read from the shared buffer will be a safe distance away
    from where the recording thread is writing new samples.
*/

#define SHARED_BUFFER_SIZE  220500

/*
    The buffer is actually a ring buffer,
    i.e. it will start to overwrite its first samples once the buffer is full.

    Instead of a sample pointer that keeps track of exactly where in the buffer
    we are writing, we maintain a sample count ("numberOfReceivedSamples"),
    which keeps track of how many samples have been written
    into the buffer since this count was set to zero (typically, at program start-up).

    The sample pointer can be computed from the sample count as follows:

        int64_t sampleCount =
            OSAtomicAdd64Barrier (0, theSharedBuffer.numberOfReceivedSamples);
        samplePointer = sampleCount % SHARED_BUFFER_SIZE;
*/
extern struct SharedBuffer
{
    double samples [SHARED_BUFFER_SIZE];   // so that's approximately 1.76 megabytes
    /*
        The sample count has to be:
        - volatile, because addresses of atomically updated variables have to be
          volatile, which is because multiple threads could write to such variables;
          in our case, the recording thread updates the sample count,
          and the GUI thread reads it.
        - aligned to an 8-byte boundary, as required by OSAtomicAdd64Barrier ();
          on 32-bit iOS,
          it is not guaranteed that an int64_t is automatically so aligned.
    */
    volatile
    __attribute__((aligned(8)))   // the iOS way to denote byte alignment of variables
        int64_t numberOfReceivedSamples;
    int64_t numberOfSentSamples;
} theSharedBuffer;

/* End of file SharedBuffer.h */

And a SharedBuffer object is defined in SharedBuffer.c:

/*
 * SharedBuffer.c
 * This software is released under the GNU General Public License. No warranty.
 *
 * version 2016-04-17
 */

#include "SharedBuffer.h"

/*
    The definition of the global struct via which the recording thread communicates
    with the GUI thread.
*/
struct SharedBuffer theSharedBuffer;

/* End of file SharedBuffer.c */

This is how the recording callback using this shared buffer would fit in the aurioTouch source code from Apple:

struct CallbackData {
    AudioUnit rioUnit;
    BOOL *audioChainIsBeingReconstructed;
    CallbackData(): rioUnit(NULL), audioChainIsBeingReconstructed(NULL) {}
} cd;

static OSStatus renderCallback (void *inRefCon,
                                AudioUnitRenderActionFlags *ioActionFlags,
                                const AudioTimeStamp *inTimeStamp,
                                UInt32 inBusNumber,
                                UInt32 inNumberFrames,
                                AudioBufferList *ioData)
{
    OSStatus err = noErr;
    if (*cd.audioChainIsBeingReconstructed == NO)
    {
        // we are calling AudioUnitRender on the input bus of AURemoteIO
        // this will store the audio data captured by the microphone in ioData
        err = AudioUnitRender (cd.rioUnit, ioActionFlags, inTimeStamp, 1,
                               inNumberFrames, ioData);
        
        float *source = (float *) ioData -> mBuffers [0]. mData;
        int64_t sampleCount =
            OSAtomicAdd64Barrier (0, & theSharedBuffer.numberOfReceivedSamples);
        int32_t samplePointer = sampleCount % SHARED_BUFFER_SIZE;

        for (int32_t i = 0; i < inNumberFrames; i ++) {
            if (samplePointer >= SHARED_BUFFER_SIZE)
                samplePointer -= SHARED_BUFFER_SIZE;
            theSharedBuffer.samples [samplePointer] = (double) source [i];
            samplePointer += 1;
        }
        OSAtomicAdd64Barrier (inNumberFrames, & theSharedBuffer.numberOfReceivedSamples);

        /*
            The audio unit is a bidirectional one: it does both input and output.
            Silence the output sound.
        */
        for (int i = 0; i < ioData -> mNumberBuffers; ++ i)
            memset (ioData -> mBuffers [i]. mData, 0,
                    ioData -> mBuffers [i]. mDataByteSize);
    }
    return err;
}

Note that instead of locks we use OSAtomicAdd64Barrier(), i.e. only theSharedBuffer.numberOfReceivedSamples is protected. This is OK, because the relevant samples in the buffer, which lie before theSharedBuffer.numberOfReceivedSamples, will have been read several seconds before they are overwritten.

In the GUI thread, we find (in Objective C):

- (void) timerCallback
{
    static VokaturiVoice theVoice;
    if (! theVoice) {
        theVoice = VokaturiVoice_create (44100.0, 441000);
        if (! theVoice)
            return;
    }
    int64_t numberOfReceivedSamples =
        OSAtomicAdd64Barrier (0, & theSharedBuffer.numberOfReceivedSamples);
    if (numberOfReceivedSamples == 0)
        return;   // nothing recorded yet
    if (numberOfReceivedSamples > theSharedBuffer.numberOfSentSamples) {
        for (int64_t i = theSharedBuffer.numberOfSentSamples;
                     i < numberOfReceivedSamples; i ++)
        {
            int32_t indexOfSampleToBeSent = (int32_t) (i % SHARED_BUFFER_SIZE);
            VokaturiVoice_fill (theVoice, 1,
                                & theSharedBuffer.samples [indexOfSampleToBeSent]);
        }
        theSharedBuffer.numberOfSentSamples = numberOfReceivedSamples;
        static VokaturiQuality quality;
        static VokaturiEmotionProbabilities emotionProbabilities;
        VokaturiVoice_extract (theVoice, & quality, & emotionProbabilities);
        if (quality.valid) {
            ourShowInGUI (
                emotionProbabilities.neutrality,
                emotionProbabilities.happiness,
                emotionProbabilities.sadness,
                emotionProbabilities.anger,
                emotionProbabilities.fear
            );
        }
    }
}

Here is how the iOS audio elements are initialized (based on source code from aurioTouch by Apple):

/*
    AudioController.h
    By Vokaturi 2016-04-17, with source code from aurioTouch by Apple.
*/

#import <AudioToolbox/AudioToolbox.h>
#import <AVFoundation/AVFoundation.h>

@interface AudioController : NSObject {
    AudioUnit               _rioUnit;
    AVAudioPlayer*          _audioPlayer;   // for button pressed sound
    BOOL                    _audioChainIsBeingReconstructed;
}

@property (nonatomic, assign, readonly) BOOL audioChainIsBeingReconstructed;

- (OSStatus)    startIOUnit;
- (OSStatus)    stopIOUnit;

@end
/*
    AudioController.mm
    By Vokaturi 2016-04-17, with source code from aurioTouch by Apple.
*/

#import "AudioController.h"

// Framework includes
#import <AVFoundation/AVAudioSession.h>

#import "SharedBuffer.h"

- (void) setupIOUnit
{
    // Create a new instance of AURemoteIO
    
    AudioComponentDescription desc;
    desc.componentType = kAudioUnitType_Output;
    desc.componentSubType = kAudioUnitSubType_RemoteIO;
    desc.componentManufacturer = kAudioUnitManufacturer_Apple;
    desc.componentFlags = 0;
    desc.componentFlagsMask = 0;
    
    AudioComponent comp = AudioComponentFindNext (NULL, & desc);
    AudioComponentInstanceNew (comp, & _rioUnit);

    /*
        Enable input and output on AURemoteIO.
        Input is enabled on the input scope of the input element.
        Output is enabled on the output scope of the output element.
    */
    UInt32 one = 1;
    AudioUnitSetProperty (_rioUnit, kAudioOutputUnitProperty_EnableIO,
                          kAudioUnitScope_Input, 1, & one, sizeof one);
    AudioUnitSetProperty (_rioUnit, kAudioOutputUnitProperty_EnableIO,
                          kAudioUnitScope_Output, 0, & one, sizeof one);

    /*
        Explicitly set the input and output client formats:
        sample rate = 44100 Hz, number of channels = 1, format = 32-bit floating point
    */
    AudioStreamBasicDescription ioFormat;
    int numberOfChannels = 1;   // set to 1 for mono, or 2 for stereo
    bool channelsAreInterleaved = false;  // true: left[0], right[0], left[1], right[1]..
                                          // false: separate buffers for left and right
    ioFormat. mSampleRate = 44100;
    ioFormat. mFormatID = kAudioFormatLinearPCM;
    ioFormat. mFormatFlags =
        kAudioFormatFlagsNativeEndian |
        kAudioFormatFlagIsPacked |
        kAudioFormatFlagIsFloat |
        ( channelsAreInterleaved ? 0 : kAudioFormatFlagIsNonInterleaved );
    ioFormat. mBytesPerPacket = sizeof (float) *
        ( channelsAreInterleaved ? numberOfChannels : 1);
    ioFormat. mFramesPerPacket = 1;
    ioFormat. mBytesPerFrame = ioFormat. mBytesPerPacket;
    ioFormat. mChannelsPerFrame = numberOfChannels;
    ioFormat. mBitsPerChannel = sizeof (float) * 8;
    ioFormat. mReserved = 0;

    AudioUnitSetProperty (_rioUnit, kAudioUnitProperty_StreamFormat,
                          kAudioUnitScope_Output, 1, & ioFormat, sizeof ioFormat);
    AudioUnitSetProperty (_rioUnit, kAudioUnitProperty_StreamFormat,
                          kAudioUnitScope_Input, 0, & ioFormat, sizeof ioFormat);

    /*
        Set the MaximumFramesPerSlice property.
        This property is used to describe to an audio unit the maximum number of samples
        it will be asked to produce on any single given call to AudioUnitRender.
    */
    UInt32 maxFramesPerSlice = 4096;
    AudioUnitSetProperty (_rioUnit, kAudioUnitProperty_MaximumFramesPerSlice,
                          kAudioUnitScope_Global, 0,
                          & maxFramesPerSlice, sizeof maxFramesPerSlice);

    /*
        Get the property value back from AURemoteIO.
        We are going to use this value to allocate buffers accordingly.
    */
    UInt32 propSize = sizeof (UInt32);
    AudioUnitGetProperty (_rioUnit, kAudioUnitProperty_MaximumFramesPerSlice,
                          kAudioUnitScope_Global, 0,
                          & maxFramesPerSlice, & propSize);

    /*
        We need references to certain data in the render callback.
        This simple struct is used to hold that information.
    */
    cd.rioUnit = _rioUnit;
    cd.audioChainIsBeingReconstructed = & _audioChainIsBeingReconstructed;

    /*
        Set the render callback on AURemoteIO.
    */
    AURenderCallbackStruct renderCallbackStruct;
    renderCallbackStruct.inputProc = renderCallback;
    renderCallbackStruct.inputProcRefCon = NULL;
    AudioUnitSetProperty (_rioUnit, kAudioUnitProperty_SetRenderCallback,
                          kAudioUnitScope_Input, 0,
                          & renderCallbackStruct, sizeof renderCallbackStruct);

    /*
        Initialize the AURemoteIO instance.
    */
    AudioUnitInitialize (_rioUnit);
}

- (OSStatus) startIOUnit
{
    OSStatus err = AudioOutputUnitStart (_rioUnit);
    if (err) NSLog (@"couldn't start AURemoteIO: %d", (int) err);
    return err;
}

3. Real-time implementation in Android

The files VokaMonoOpen-3-0-android.zip and VokaStereoOpen-3-0-android.zip contain complete demo apps called “VokaMono” and “VokaStereo”, respectively, which you can open directly with Android Studio. Each of these projects contains a copy of the OpenVokaturi-3-0-android.aar library.

4. Inclusion in your iOS or Android app

If you want to experiment with the demo code, please understand the following issues about the licence:

  1. You can freely modify the code for your own use.
  2. If you distribute the app that contains your modified code, you can freely do so for the GUI (VokaMono or VokaStereo) part of the source code, because that part of the code, being demo code, is in the public domain.
  3. If you distribute the app that contains your modified code, you can include the emotion detection library OpenVokaturi-3-0-ios.a or OpenVokaturi-3-0-android.aar (or any other OpenVokaturi library, or the OpenVokaturi source code) into your app only if you distribute your app under the General Public Licence, i.e. as open source. This is because the open-source edition of the Vokaturi library is released under the General Public Licence.
  4. If you want to distribute your app without releasing its source code under the General Public Licence, you cannot include any open-source edition of the Vokaturi library, but you should instead buy a VokaturiPlus licence.
  5. The pictures included with the demo app are copyrighted by Vokaturi. We expect that if you distribute your app, you will not include these example pictures, but use your own ones instead.