StereoBatRecorder: Difference between revisions
mNo edit summary |
No edit summary |
||
(7 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
}} | }} | ||
This project is about signal processing of bat echolocation audio recorded through a | This project is about signal processing of bat echolocation audio recorded through a pair of ultrasonic microphones. The goal is to automatically determine the flight direction of the bat. | ||
Knowing the flight direction gives a lot of additional information, like if the bat is just passing or whether it is feeding, getting a better sense where the bat colony is and where the feeding ground is, etc. Having this determined automatically saves a lot of time compared to physically observing the bats in the field at odd hours. | Knowing the flight direction gives a lot of additional information, like if the bat is just passing or whether it is feeding, getting a better sense where the bat colony is and where the feeding ground is, etc. Having this determined automatically saves a lot of time compared to physically observing the bats in the field at odd hours. | ||
The main principle of determining the flight direction is to measure the time-of-arrival delay between the signals arriving at the two microphones. For example, when a bat is flying from left to right, the first recorded pulse will arrive slightly earlier at the left microphone, with the left-right delay decreasing to 0 as the bat flies directly in front of the two microphones, then the delay will reverse with signals arriving earlier at the right microphone as it flies further to the right. | The main principle of determining the flight direction is to measure the time-of-arrival delay between the signals arriving at the two microphones. For example, when a bat is flying from left to right, the first recorded pulse will arrive slightly earlier at the left microphone, with the left-right delay decreasing to 0 as the bat flies directly in front of the two microphones, then the delay will reverse with signals arriving earlier at the right microphone as it flies further to the right. | ||
== TODO == | |||
Create a reference set of recordings, e.g. | |||
* unambiguous passes, left-to-right and right-to-left | |||
* unambiguous passes but with relative "flat" pulses | |||
* feeding bat, circling in front of the recorder | |||
* recording containing a feeding buzz | |||
* recording containing social calls | |||
== Hardware == | == Hardware == | ||
The hardware used is a stereo bat detector that has already been developed, it has the following specifications: | The hardware used is a stereo bat detector that has already been developed, it has the following specifications: | ||
* sampling rate is | * sampling rate is 240 kS/s for each channel (left and right); | ||
* sampling resolution is 8 bits | * sampling resolution is 8 bits; | ||
* microphone used is the Knowles FG-3329, with useful response over 100 kHz | * microphone used is the Knowles FG-3329, with useful response over 100 kHz; | ||
* microphone separation is about 50 mm | * microphone separation is about 50 mm; | ||
* audio is recorded on an sd-card as a stereo "wav" file, so it can be directly opened in audio editing/viewing software (e.g. cooledit, audacity, raven, etc.) | * audio is recorded on an sd-card as a stereo "wav" file, so it can be directly opened in audio editing/viewing software (e.g. cooledit, audacity, raven, etc.) | ||
To get some feeling for the parameters involved: | To get some feeling for the parameters involved (at 240 kS/s): | ||
* one cycle of a 40 kHz bat signal consists of ''' | * one cycle of a 40 kHz bat signal consists of '''6''' samples | ||
* a typical bat pulse of (say) 5 ms consists of about ''' | * a typical bat pulse of (say) 5 ms consists of about '''1200''' samples | ||
* the delay between the left and right microphone is about '''0.15''' ms, or ''' | * the delay between the left and right microphone is about '''0.15''' ms, or '''36''' samples | ||
== Approach == | == Approach == | ||
The plan is to prototype algorithms in a numerical math package (like matlab, octave or scilab), then port the algorithm over to a more high-level language (like C or Java) | The plan is to prototype algorithms in a numerical math package (like matlab, octave or scilab), then port the algorithm over to a more high-level language (like C or Java). | ||
Code/scripts is maintained [https://github.com/bertrik/stereobat on github]. | |||
Example stereo bat calls can be found [https://drive.google.com/folderview?id=0ByItZAj1MynOcExNcmZtejFHLWs&usp=sharing on my google drive]. | |||
== Signal processing steps == | == Signal processing steps == | ||
Line 54: | Line 66: | ||
=== Hilbert transform === | === Hilbert transform === | ||
An interesting mathematical operation is the Hilbert transform. It allows conversion of a simple real signal into a complex " | An interesting mathematical operation is the Hilbert transform. It allows conversion of a simple real signal into a complex [http://en.wikipedia.org/wiki/Analytic_signal "analytic signal"]. The analytic signal can be converted into an instantaneous amplitude estimate and an instantaneous frequency estimate. This probably works well for bat signals, since many bat signals are relatively "pure" (don't have many harmonics). | ||
So, using the Hilbert transform, a bat call can be relatively simply separated into an envelope signal and a frequency signal. | So, using the Hilbert transform, a bat call can be relatively simply separated into an envelope signal and a frequency signal. | ||
Line 61: | Line 73: | ||
* a common source of noise is the sound of car brakes, often between 10 kHz - 20 kHz, partly in the range of bat calls. Can recordings containing these be automatically discarded? | * a common source of noise is the sound of car brakes, often between 10 kHz - 20 kHz, partly in the range of bat calls. Can recordings containing these be automatically discarded? | ||
* is it possible to detect that a bat is purely "passing by" (not feeding) by looking at the regularity (inter-pulse-interval) of the calls? | * is it possible to detect that a bat is purely "passing by" (not feeding) by looking at the regularity (inter-pulse-interval) of the calls? | ||
== Example call == | |||
Below is an example of a bat call from a pipistrellus bat. | |||
[plaatje] | |||
and its spectrogram | |||
[plaatje] | |||
You can see a bit of noise in the bottom of the spectrogram, and also that one of the channels is a bit stronger than the other. | |||
First, we load the wav-file into octave and remove the low-frequency noise | |||
<pre> | |||
# load wav and get sample rate, bits per sample | |||
[y,fs,bps] = wavread("call1.wav"); | |||
# create and apply a 4-th order butterworth high-pass filter (maximally flat in the pass-band) | |||
[b,a] = butter(4, 0.2, "high"); | |||
y2 = filter(b,a,y); | |||
</pre> | |||
Next, we use the 'xcov' (cross-covariance) to calculate a set of correlations vs time offset. | |||
The cross-covariance function is very similar to the cross-correlation function, with the added advantage that the mean is automatically subtracted. | |||
<pre> | |||
# calculate cross-covariance on y2(:,1) (left) and y2(:,2) (right) data, with a max delay of +/- 50 | |||
[r,lag] = xcov(y2(:,1), y2(:,2), 50); | |||
plot(lag,r); | |||
</pre> | |||
[plaatje] | |||
From the correlation vs time plot we can see which delay has the highest correlation. | |||
However, the signal is oscillatory, with the peaks about 6 samples apart, corresponding to the mean frequency of the bat call (about 40 kHz). | |||
To smooth the peaks and get an approximation of the envelope, we apply the hilbert transform and take the absolute value of it | |||
<pre> | |||
# take absolute value of the hilbert transform to get the envelope of the correlation | |||
r2 = abs(hilbert(r)); | |||
plot(lag,r); hold on; | |||
plot(lag,r2); | |||
</pre> | |||
[plaatje] | |||
The correlation plot is now smooth and we simply take the maximum value now to find the index of maximum correlation | |||
<pre> | |||
# find the index of maximum value | |||
[m,ind] = max(r2); | |||
# look up the lag at this index | |||
lag(ind) | |||
</pre> |
Latest revision as of 16:18, 27 May 2014
Project StereoBatRecorder | |
---|---|
Status | In progress |
Contact | bertrik |
Last Update | 2014-05-27 |
This project is about signal processing of bat echolocation audio recorded through a pair of ultrasonic microphones. The goal is to automatically determine the flight direction of the bat.
Knowing the flight direction gives a lot of additional information, like if the bat is just passing or whether it is feeding, getting a better sense where the bat colony is and where the feeding ground is, etc. Having this determined automatically saves a lot of time compared to physically observing the bats in the field at odd hours.
The main principle of determining the flight direction is to measure the time-of-arrival delay between the signals arriving at the two microphones. For example, when a bat is flying from left to right, the first recorded pulse will arrive slightly earlier at the left microphone, with the left-right delay decreasing to 0 as the bat flies directly in front of the two microphones, then the delay will reverse with signals arriving earlier at the right microphone as it flies further to the right.
TODO
Create a reference set of recordings, e.g.
- unambiguous passes, left-to-right and right-to-left
- unambiguous passes but with relative "flat" pulses
- feeding bat, circling in front of the recorder
- recording containing a feeding buzz
- recording containing social calls
Hardware
The hardware used is a stereo bat detector that has already been developed, it has the following specifications:
- sampling rate is 240 kS/s for each channel (left and right);
- sampling resolution is 8 bits;
- microphone used is the Knowles FG-3329, with useful response over 100 kHz;
- microphone separation is about 50 mm;
- audio is recorded on an sd-card as a stereo "wav" file, so it can be directly opened in audio editing/viewing software (e.g. cooledit, audacity, raven, etc.)
To get some feeling for the parameters involved (at 240 kS/s):
- one cycle of a 40 kHz bat signal consists of 6 samples
- a typical bat pulse of (say) 5 ms consists of about 1200 samples
- the delay between the left and right microphone is about 0.15 ms, or 36 samples
Approach
The plan is to prototype algorithms in a numerical math package (like matlab, octave or scilab), then port the algorithm over to a more high-level language (like C or Java).
Code/scripts is maintained on github.
Example stereo bat calls can be found on my google drive.
Signal processing steps
I expect the following steps:
- pre-processing: high-pass filter (e.g. 10 kHz) the signal to remove low-frequency noise and make the signal DC-free
- detecting and isolating the individual calls
- come up with some metric (e.g. energy in 0.1 ms block) to separate bat sounds from background noise
- stitching "holes" in a call (e.g. ignore gaps of < 0.5 ms)
- discarding non-bat clicks (e.g. ignore signals < 0.5 ms in duration)
- determining the delay between the left and right channel
- possible methods:
- direct correlation, can be accelerated by FFT
- correlation of the amplitude envelope
- correlation of the frequency slope
- compare center of gravity of the envelope
- perhaps switch algorithms depending on the type of call (flat/CF or steep/FM)
- possible methods:
- determine flight direction based on set of delays measured during a pass
- verify that delay changes monotonically during a pass, e.g. start with simple assumption that delay vs. time is linear and calculate a best fit.
- discard outliers in a smart way
- consider using the call energy as a weight factor, so clear calls have more significance than quiet ones
- cool ideas for further classification (optional):
- calculate doppler shift to get a flight speed estimate
- output the result
- make a nice graph to visually verify the conclusion made by the algorithms
- export result to .csv for example
Hilbert transform
An interesting mathematical operation is the Hilbert transform. It allows conversion of a simple real signal into a complex "analytic signal". The analytic signal can be converted into an instantaneous amplitude estimate and an instantaneous frequency estimate. This probably works well for bat signals, since many bat signals are relatively "pure" (don't have many harmonics). So, using the Hilbert transform, a bat call can be relatively simply separated into an envelope signal and a frequency signal.
Random thoughts yet to be organized
- I've observed some common pipistrelle bats en-route to use a very flat type of pulse, direct correlation works badly on these pulses. So perhaps use some other type of correlation (e.g. correlate on envelope)
- a common source of noise is the sound of car brakes, often between 10 kHz - 20 kHz, partly in the range of bat calls. Can recordings containing these be automatically discarded?
- is it possible to detect that a bat is purely "passing by" (not feeding) by looking at the regularity (inter-pulse-interval) of the calls?
Example call
Below is an example of a bat call from a pipistrellus bat.
[plaatje]
and its spectrogram
[plaatje]
You can see a bit of noise in the bottom of the spectrogram, and also that one of the channels is a bit stronger than the other.
First, we load the wav-file into octave and remove the low-frequency noise
# load wav and get sample rate, bits per sample [y,fs,bps] = wavread("call1.wav"); # create and apply a 4-th order butterworth high-pass filter (maximally flat in the pass-band) [b,a] = butter(4, 0.2, "high"); y2 = filter(b,a,y);
Next, we use the 'xcov' (cross-covariance) to calculate a set of correlations vs time offset. The cross-covariance function is very similar to the cross-correlation function, with the added advantage that the mean is automatically subtracted.
# calculate cross-covariance on y2(:,1) (left) and y2(:,2) (right) data, with a max delay of +/- 50 [r,lag] = xcov(y2(:,1), y2(:,2), 50); plot(lag,r);
[plaatje]
From the correlation vs time plot we can see which delay has the highest correlation. However, the signal is oscillatory, with the peaks about 6 samples apart, corresponding to the mean frequency of the bat call (about 40 kHz). To smooth the peaks and get an approximation of the envelope, we apply the hilbert transform and take the absolute value of it
# take absolute value of the hilbert transform to get the envelope of the correlation r2 = abs(hilbert(r)); plot(lag,r); hold on; plot(lag,r2);
[plaatje]
The correlation plot is now smooth and we simply take the maximum value now to find the index of maximum correlation
# find the index of maximum value [m,ind] = max(r2); # look up the lag at this index lag(ind)