Reading Audio Files

The previous two posts, Audio Analysis on an iPhone and Fourier Transforms on an iPhone have covered some of the theory of audio sampling and analysis.  What hasn’t been mentioned, is how audio samples are actually loaded into our program.  There are fundamentally two ways to do this:

  1. Realtime sampling of incoming audio.  For bat calls, this would be the output from a time expanding bat detector.
  2. Loading pre-recorded audio from a file.

Initially we will be using method (2).  As mentioned before, it’s much easier to load samples from a file, rather than make a bat squeak into a microphone on demand!

So we need to read from an audio file on disk  to a list of samples held in an array in our program.  Apple provide tools to do this, in the form of an API called Core Audio.  As with the Accelerate/vDSP framework, this is quite a low level API in C, and it has a reputation for being tricky to use.  Fortunately for us, we only need to use a small part of it that deals with reading and converting audio formats.

Loading Audio

Obtain a reference to the audio file:

// myFile is the filesystem path and filename we are loading
CFStringRef str = CFStringCreateWithCString(
    NULL, 
    myFile, 
    kCFStringEncodingMacRoman
);
CFURLRef inputFileURL = CFURLCreateWithFileSystemPath(
    kCFAllocatorDefault,
    str,
    kCFURLPOSIXPathStyle,
    false
);

 ExtAudioFileRef fileRef;
 ExtAudioFileOpenURL(inputFileURL, fileRef);

Transform the audio file into the format we want:

    
// "sample"  - An instantaneous amplitude of the signal in a 
// single audio channel, represented as an integer, 
// floating-point, or fixed-point number. 

// "channel" - A discrete track of audio. A monaural recording has 
// exactly one channel.

// "frame"   - A set of samples that contains one sample from 
// each channel in an audio data stream

// "packet"  - An encoding-defined unit of audio data comprising 
// one or more frames. For PCM audio, each packet corresponds to 
// one frame.

// A "packet" contains a "frame" which contains 
// "channels" which contain "samples".
// For mono PCM, there is one channel, so one channel per frame, 
// and one frame per packet.  So each packet contains only one 
// sample.

    // Set up audio format we want the data in
    // Each sample is of type Float32
    AudioStreamBasicDescription audioFormat;
    audioFormat.mSampleRate = 44100;
    audioFormat.mFormatID = kAudioFormatLinearPCM;
    audioFormat.mFormatFlags = kLinearPCMFormatFlagIsFloat;
    audioFormat.mBitsPerChannel = sizeof(Float32) * 8;
    audioFormat.mChannelsPerFrame = 1; // Mono
    audioFormat.mBytesPerFrame = audioFormat.mChannelsPerFrame * sizeof(Float32);  // == sizeof(Float32)
    audioFormat.mFramesPerPacket = 1;
    audioFormat.mBytesPerPacket = audioFormat.mFramesPerPacket * audioFormat.mBytesPerFrame; // = sizeof(Float32)

    // 3) Apply audio format to the Extended Audio File
    ExtAudioFileSetProperty(
        fileRef,
        kExtAudioFileProperty_ClientDataFormat,
        sizeof (AudioStreamBasicDescription), //= audioFormat
        &audioFormat);

Allocate some space in memory:

    int numSamples = 1024; //How many samples to read in at a time
    UInt32 sizePerPacket = audioFormat.mBytesPerPacket; // = sizeof(Float32) = 32bytes
    UInt32 packetsPerBuffer = numSamples;
    UInt32 outputBufferSize = packetsPerBuffer * sizePerPacket;

    // So the lvalue of outputBuffer is the memory location where we have reserved space
    UInt8 *outputBuffer = (UInt8 *)malloc(sizeof(UInt8 *) * outputBufferSize);

    convertedData.mNumberBuffers = 1;    // Set this to 1 for mono
    convertedData.mBuffers[0].mNumberChannels = audioFormat.mChannelsPerFrame;  //also = 1
    convertedData.mBuffers[0].mDataByteSize = outputBufferSize;
    convertedData.mBuffers[0].mData = outputBuffer; //

And then finally read the audio in:

    
    UInt32 frameCount = numSamples;
    while (frameCount > 0) {
        ExtAudioFileRead(
            fileRef,
            &frameCount
        )
     if (frameCount > 0)  {
            AudioBuffer audioBuffer = convertedData.mBuffers[0]; 
            float *samplesAsCArray = (float *)audioBuffer.mData; // Cast from the audio buffer to a C style array
            std::vector samplesAsVector;                  // And then to a temporary C++ vector;
            samplesAsVector.assign(samplesAsCArray, samplesAsCArray + frameCount); 
            samples.insert(samples.end(), samplesAsVector.begin(), samplesAsVector.end()); // And then into our final samples vector
        }
    }

Finally after all that, we have arrived with our audio samples as a C++ vector, which we can then analyse with Apples digital signal processing API.

One important caveat is we have the entire audio file uncompressed in memory. This is fine for short recordings (< 1min), for longer recordings we would need a way to process the file in sections.