Audio Analysis on an iPhone

Of the four main mobile platforms (Android, iOS, Blackberry and Windows Mobile) we have chosen to target iOS.  This was chosen due to the expertise of the developers on the project, and because iOS provides a native programming interface (API) for performing signal processing and linear algebra calculations.

In A picture of the challenge ahead you can see the main steps involved in identifying a bat call.  The 1st step major step in this is “(3) -Enable call isolation”

Enable call isolation

Fundamentally, an audio file is simply a long list of numbers specifying the amplitude of a sound wave over time.  If you imagine a sound wave moving through the air and hitting the magnet in a microphone, the amplitude is the distance from the origin that the magnet moves.

This is an analogue signal.  In order for a computer to process it, it needs to be converted to a digital representation.  This is called “sampling”, and is a measurement of the position of the microphone taken many times a second.  A theorem called the Nyquist sampling theorem states that in order to sample a signal of X Hz without significant loss of quality, you need to sample at 2X the frequency.  The limit of human hearing is approximately 20kHz, which hence requires a sample rate of approximately 40Khz.  This is why CDs are sampled at 44Khz.  i.e. each second of recording in a CD contains 44,000 measurements of the highest possible frequency contained in the recording.

A 10x “time expansion” bat detector would record 1sec of audio and play this back over 10sec.  For a Pipistrelle this would reduce it’s frequency from about 50kHz to 5Khz, which can be recorded by ordinary equipment with a sample rate of 44.1kHz.  Without the time expansion we would require specialist equipment that could sample at 100kHz.

Once we have our recording, we need to identify where within it there are actual bat calls.  We have a measurement of audio amplitude against time, which isn’t very easy to analyse.  We want to identify where there are certain frequency characteristics within this audio.  To convert from a time domain to a frequency domain we need to perform a Fourier transform.  Because we have digitally sampled data, we need a discrete Fourier transform, and we will use a quick form of this called a fast Fourier  transform (FFT).  Apple’s iOS provides a native API for performing FFTs and other linear alegebra.  We will be using this to process the time expanded bat calls and detect where within the audio there is a call.

The next blog posting will go into this in detail.

Tell me more about BatMobile…

Bats are important biodiversity indicator species that help us to keep track of the health of  our environment, but as cryptic nocturnal mammals, researching their distributions and populations is scientifically challenging to say the least.

Current bat detecting equipment is expensive and methods for call identification require specialist knowledge, are time-consuming and often subjective. We propose to develop an innovative prototype smartphone application which will solve many of these problems.

So, how will it work?

  • Well, because bat calls are ultrasonic there’s no way that the in-phone mic will be up to the job so we will need to attach an external microphone (in the first instance, a tried and tested high quality microphone costing ~£500, but the idea is to look at far cheaper ones later on).
  • Next we need to find a reliable way to display the recorded calls on-phone in real time using open source sonogram software that we will adapt for the purpose.
  • Algorithms then need to be written to enable the isolation, characterisation and identification of calls on-phone.
  • Simple…

Coupled with the GPS signal from the smartphone, this would provide researchers with much needed accurate information about species distributions that can feed into national research programmes and inform conservation policy.

There are many challenges to finding a workable solution to this problem.  In addition to the myriad issues around variability of calls, call isolation and effective pattern matching a particular focus of this project will be finding a compromise between what processing gets done on the phone and what gets done on the server.  In an ideal world you would do everything on the phone, meaning that biologists could be out in the field, well out of signal range and still work efficiently.  But will modern phones be man enough for the job?…