Difference between revisions of "EspAudioSensor"

From RevSpace
Jump to navigation Jump to search
m (Decibel meters =)
m (Hardware)
 
(35 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
{{Project
 
{{Project
 
   |Name=ESP audio sensor
 
   |Name=ESP audio sensor
   |Picture=NoPicture.jpg
+
   |Picture=inmp441.jpg
   |Omschrijving=ESP-based audio sensor
+
   |Omschrijving=ESP32-based audio sensor
   |Status=Initializing
+
   |Status=In progress
 
   |Contact=bertrik
 
   |Contact=bertrik
 
   }}
 
   }}
Line 11: Line 11:
  
 
The plan is to do this by combining an inexpensive WiFi-enabled ESP-32 microcontroller with a standard I2S digital microphone.
 
The plan is to do this by combining an inexpensive WiFi-enabled ESP-32 microcontroller with a standard I2S digital microphone.
 +
The ESP-32 performs a frequency analysis and is able to forward the measurement to a server.
 +
 +
Further development:
 +
* It seems to work so far! Let's test it, e.g. by running it all day and plotting the noise values.
 +
* Add a feature to forward the audio directly, so we can verify the received audio actually makes sense, is not excessively loud (e.g. clipping) or soft.
 +
* Add client code for uploading data somewhere, perhaps just start with sending MQTT into influx, so we can plot things using grafana
 +
 +
== How can you help ==
 +
 +
How you can help with this project:
 +
* help me find an answer to the following questions:
 +
** what are the norms for audio measurement? how often? how long?
 +
* help me qualify the microphone currently chosen
 +
* help me design a nice case for the project, also taking some practical things into account
 +
** weather protection
 +
** providing power
 +
* help me with the server infrastructure
 +
* help me making this project more widely known
 +
* I probably cannot help *you* with your political mission to stop/ban certain noisy activities that annoy you. My main motivation is technical, I just like to measure things and visualise them.
  
 
== Measuring audio as a citizen science project ==
 
== Measuring audio as a citizen science project ==
The idea is the following:
+
The concept is the following:
 
* A small box containing a microphone outside your house measures environmental sounds (traffic, etc), for example it takes a 1 second audio recording every 10 seconds
 
* A small box containing a microphone outside your house measures environmental sounds (traffic, etc), for example it takes a 1 second audio recording every 10 seconds
 
* A spectral analysis is made of the audio (separating it in different frequency bands), calculating sound intensity for each individual band
 
* A spectral analysis is made of the audio (separating it in different frequency bands), calculating sound intensity for each individual band
Line 23: Line 42:
  
 
External things to investigate:
 
External things to investigate:
 +
* How is audio being evaluated in general, are there norms for it?
 +
** how often should audio be sampled? 1 second/minute? 10 seconds/min?
 +
** is it a valid assumption to use full octaves? or should the spectrum be divided in smaller parts?
 
* https://www.rijksoverheid.nl/onderwerpen/geluidsoverlast/geluidsoverlast-in-de-wet
 
* https://www.rijksoverheid.nl/onderwerpen/geluidsoverlast/geluidsoverlast-in-de-wet
  
Line 29: Line 51:
 
We can then easily apply sensor/housing specific corrections, do A weighting, etc.
 
We can then easily apply sensor/housing specific corrections, do A weighting, etc.
  
Subjective audio levels are generally calculated on a logarithmic scale in dB using "A-weighting".
+
=== A weighting ===
A-weighting calculates a subjective loudness level from a physical loudness, applying a correction factor for each band.
+
Subjective audio levels are generally calculated on a logarithmic scale in dB using [https://en.wikipedia.org/wiki/A-weighting "A-weighting"].
The division of the audio spectrum is chosen so it matches the octaves used in the A-weighting, see https://en.wikipedia.org/wiki/Octave_band#Octave_Bands
+
A-weighting calculates a subjective loudness level from a physical loudness, applying a correction factor for each (part of an) octave band.
 +
 
 +
{| class="wikitable"
 +
|-
 +
! Octave start
 +
! Octave center
 +
! Octave end
 +
! Remark
 +
|-
 +
| -
 +
| -
 +
| 22627 Hz
 +
| sample rate
 +
|-
 +
| 5657 Hz
 +
| 8000 Hz
 +
| 11314 Hz
 +
|-
 +
| 2828 Hz
 +
| 4000 Hz
 +
| 5657 Hz
 +
|-
 +
| 1414 Hz
 +
| 2000 Hz
 +
| 2828 Hz
 +
|-
 +
| 707 Hz
 +
| 1000 Hz
 +
| 1414 Hz
 +
|-
 +
| 354 Hz
 +
| 500 Hz
 +
| 707 Hz
 +
|-
 +
| 177 Hz
 +
| 250 Hz
 +
| 354 Hz
 +
|-
 +
| 88 Hz
 +
| 125 Hz
 +
| 177 Hz
 +
|-
 +
| 44 Hz
 +
| 63 Hz
 +
| 88 Hz
 +
|-
 +
| 22 Hz
 +
| 31 Hz
 +
| 44 Hz
 +
|}
 +
 
 +
[http://www.dedicatedacoustics.com.au/Articles/A%20Weighting%20Corrections.html List of A-weighting coefficients]
  
 
=== FFT ===
 
=== FFT ===
Line 38: Line 111:
 
The intensity in each octave band is by summing the energy in a set of FFT 'bins'.
 
The intensity in each octave band is by summing the energy in a set of FFT 'bins'.
 
The energy in each bin is calculated as the real part squared plus the imaginary part squared.
 
The energy in each bin is calculated as the real part squared plus the imaginary part squared.
 +
The division of the audio spectrum for the FFT is chosen so it matches the octaves used in the A-weighting as described above.
 +
I plan to use the 'flat-top' window because it has good properties for measuring power levels.
 +
 +
Investigation into FFT libraries:
 +
* [https://github.com/kosme/arduinoFFT arduinoFFT] looks nice, but .. it does all calculations in double, severely limiting the audio buffer size. For each sample, you need two doubles, so that's 16 bytes per sample. I can use about 6000 samples in the audio buffer. Using floats would give me twice the audio buffer size. I could just copy the files and modify them for double->float (and fix any other compiler warnings).
 +
* Alternatively, [https://github.com/fakufaku/esp32-fft esp32-fft] looks nice, it uses floats instead of doubles and is optimised for esp32, but it uses malloc internally ... Also I can't use it as an Arduino library because it doesn't have the arduino library structure (with a src dir, examples dir, library.json file, library.properties file, etc) Maybe I can still use it by copying it in my sketch.
  
 
Links:
 
Links:
 
* Overview of octave bands https://www.engineeringtoolbox.com/octave-bands-frequency-limits-d_1602.html
 
* Overview of octave bands https://www.engineeringtoolbox.com/octave-bands-frequency-limits-d_1602.html
 +
* http://www.robinscheibler.org/2017/12/12/esp32-fft.html
  
 
=== Decibel meters ===
 
=== Decibel meters ===
Line 54: Line 134:
 
== Hardware ==
 
== Hardware ==
 
The physical device consists of:
 
The physical device consists of:
* an ESP32 (or possibly an ESP8266), it has an I2S digital audio input for sampling data from a digital microphone and a WiFi interface to communicate things to the internet
+
* an ESP32, it has an I2S digital audio input for sampling data from a digital microphone and a WiFi interface to communicate things to the internet. It has more internal RAM than an ESP8266 (for example), this helps to take a larger audio sample and do analysis on it. It is still comparatively simple, cheap, easy to flash and powerful enough to do communication over WiFi and do audio analysis.
 
* a digital I2S microphone, like the INMP441 ([https://www.invensense.com/wp-content/uploads/2015/02/INMP441.pdf datasheet)]
 
* a digital I2S microphone, like the INMP441 ([https://www.invensense.com/wp-content/uploads/2015/02/INMP441.pdf datasheet)]
  
Waag society uses the following microphone in their [https://waag.org/en/article/new-version-smart-citizen-kit-available kit 2.1]: Invensense ICS4342.
+
Waag society uses the [https://waag.org/en/article/new-version-smart-citizen-kit-available Invensense ICS4342 microphone] in their kit 2.1.
I ordered these [https://aliexpress.com/item/INMP441/32960945048.html INMP441 microphones] from Aliexpress.
+
Myself, I ordered a couple of [https://aliexpress.com/item/INMP441/32960945048.html INMP441 microphones] from Aliexpress.
  
 
The microphone is connected to the microcontroller as follows:
 
The microphone is connected to the microcontroller as follows:
Line 71: Line 151:
 
No sensitive analog electronics are needed, the microphone and the microcontroller are simply connected using "dupont" wire.
 
No sensitive analog electronics are needed, the microphone and the microcontroller are simply connected using "dupont" wire.
  
The clock signal is 64 times higher than the sample clock, so at a sample rate of 44100 Hz, this means 2.8 MHz.
+
The I2S clock signal is 64 times higher than the sample clock, so at a sample rate of 44100 Hz, this means 2.8 MHz.
 
This might be a bit high for a random wire, probably we should keep this connection short.
 
This might be a bit high for a random wire, probably we should keep this connection short.
  
 
== Software ==
 
== Software ==
Initial code can be found [https://github.com/bertrik/NoiseLevel on github].
+
Initial code for sampling audio from the digital microphone can be found [https://github.com/bertrik/NoiseLevel on github].
  
 
What the software should do:
 
What the software should do:
* Take audio measurement from the microphone at a regular interval (say 1 second every 10 seconds)
+
* Take an audio measurement from the microphone at a regular interval (say 1 second every 10 seconds)
* On the recorded audio, perform a 4096-point FFT with a windowing function (Gaussian for example). This results in 2048 FFT coefficients.
+
* On the recorded audio, perform a 4096-point real->complex FFT with a windowing function (flat-top for example).
* Sum up FFT into power per octave, e.g. top 1024 coefficients represent octave of 11025-22050 Hz, next 512 coefficients represent is 5512-11025 Hz octave, etc.
+
* Calculate power for each FFT-bin (Im-squared + Re-squared) and sum up bins per octave.
* Calculate statistics, e.g. minimum/average/maximum as decibels in the current 5 minute interval
+
* Calculate statistics, e.g. minimum/average/maximum over a 5 minute interval and convert to a logarithmic scale (decibels)
* Every 5 minutes, send the statistics to the network using WiFi
+
* Every 5 minutes, send the statistics to the network using WiFi or LoRa
  
 
The network receives the raw decibel values and can apply corrections for specific microphones, do A-weighting, etc.
 
The network receives the raw decibel values and can apply corrections for specific microphones, do A-weighting, etc.
 
To investigate:
 
* https://github.com/maspetsberger/esp32-i2s-mems
 
* TODO library that performs FFT on ESP32
 

Latest revision as of 16:54, 7 October 2019

Project ESP audio sensor
Inmp441.jpg
ESP32-based audio sensor
Status In progress
Contact bertrik
Last Update 2019-10-07

Introduction

This project is about creating an audio level meter, for example as an environmental noise measuring node in a citizen science project or as a standalone "decibel meter".

The plan is to do this by combining an inexpensive WiFi-enabled ESP-32 microcontroller with a standard I2S digital microphone. The ESP-32 performs a frequency analysis and is able to forward the measurement to a server.

Further development:

  • It seems to work so far! Let's test it, e.g. by running it all day and plotting the noise values.
  • Add a feature to forward the audio directly, so we can verify the received audio actually makes sense, is not excessively loud (e.g. clipping) or soft.
  • Add client code for uploading data somewhere, perhaps just start with sending MQTT into influx, so we can plot things using grafana

How can you help

How you can help with this project:

  • help me find an answer to the following questions:
    • what are the norms for audio measurement? how often? how long?
  • help me qualify the microphone currently chosen
  • help me design a nice case for the project, also taking some practical things into account
    • weather protection
    • providing power
  • help me with the server infrastructure
  • help me making this project more widely known
  • I probably cannot help *you* with your political mission to stop/ban certain noisy activities that annoy you. My main motivation is technical, I just like to measure things and visualise them.

Measuring audio as a citizen science project

The concept is the following:

  • A small box containing a microphone outside your house measures environmental sounds (traffic, etc), for example it takes a 1 second audio recording every 10 seconds
  • A spectral analysis is made of the audio (separating it in different frequency bands), calculating sound intensity for each individual band
  • Every 5 minutes, the measured sound intensity is sent to a central server on the internet using your home WiFi connection. Because we only communicate intensities, this does not reveal audio fragments (conversations for example).
  • The central server takes the measured intensities and can do corrections, like a microphone specific correction, or a correction to apply A-weighting
  • The central server visualises the measurements:
    • We can plot the sound intensities on a map as a coloured dot and get a nice overview how the map changes during the day/week/year
    • We can plot the sound intensities of individual nodes vs time, and get an idea how sound varies from day/night/week/year

External things to investigate:

Theory

The plan is to divide the audio spectrum up into octaves and calculate the total energy in each octave. We can then easily apply sensor/housing specific corrections, do A weighting, etc.

A weighting

Subjective audio levels are generally calculated on a logarithmic scale in dB using "A-weighting". A-weighting calculates a subjective loudness level from a physical loudness, applying a correction factor for each (part of an) octave band.

Octave start Octave center Octave end Remark
- - 22627 Hz sample rate
5657 Hz 8000 Hz 11314 Hz
2828 Hz 4000 Hz 5657 Hz
1414 Hz 2000 Hz 2828 Hz
707 Hz 1000 Hz 1414 Hz
354 Hz 500 Hz 707 Hz
177 Hz 250 Hz 354 Hz
88 Hz 125 Hz 177 Hz
44 Hz 63 Hz 88 Hz
22 Hz 31 Hz 44 Hz

List of A-weighting coefficients

FFT

The energy in each octave is calculated by applying an FFT (fast fourier transform) on the audio data. The FFT takes in real values and outputs complex values. The intensity in each octave band is by summing the energy in a set of FFT 'bins'. The energy in each bin is calculated as the real part squared plus the imaginary part squared. The division of the audio spectrum for the FFT is chosen so it matches the octaves used in the A-weighting as described above. I plan to use the 'flat-top' window because it has good properties for measuring power levels.

Investigation into FFT libraries:

  • arduinoFFT looks nice, but .. it does all calculations in double, severely limiting the audio buffer size. For each sample, you need two doubles, so that's 16 bytes per sample. I can use about 6000 samples in the audio buffer. Using floats would give me twice the audio buffer size. I could just copy the files and modify them for double->float (and fix any other compiler warnings).
  • Alternatively, esp32-fft looks nice, it uses floats instead of doubles and is optimised for esp32, but it uses malloc internally ... Also I can't use it as an Arduino library because it doesn't have the arduino library structure (with a src dir, examples dir, library.json file, library.properties file, etc) Maybe I can still use it by copying it in my sketch.

Links:

Decibel meters

Commercially available meters:

Fairly typical specs:

  • dynamic range: 30 - 130 dB
  • accuracy: 1.5-2 dB
  • frequency range: 31.5 Hz - 8 kHz (!)
  • norm: EN 61672-1

Hardware

The physical device consists of:

  • an ESP32, it has an I2S digital audio input for sampling data from a digital microphone and a WiFi interface to communicate things to the internet. It has more internal RAM than an ESP8266 (for example), this helps to take a larger audio sample and do analysis on it. It is still comparatively simple, cheap, easy to flash and powerful enough to do communication over WiFi and do audio analysis.
  • a digital I2S microphone, like the INMP441 (datasheet)

Waag society uses the Invensense ICS4342 microphone in their kit 2.1. Myself, I ordered a couple of INMP441 microphones from Aliexpress.

The microphone is connected to the microcontroller as follows:

  • INMP441 GND to ESP32 GND
  • INMP441 VDD to ESP32 3.3V
  • INMP441 SD to ESP32 A4/32
  • INMP441 SCK to ESP32 A16/14
  • INMP441 WS to ESP32 15
  • INMP441 L/R to ESP32 GND

The connection carries only digital signals (max 3 MHz or so). No sensitive analog electronics are needed, the microphone and the microcontroller are simply connected using "dupont" wire.

The I2S clock signal is 64 times higher than the sample clock, so at a sample rate of 44100 Hz, this means 2.8 MHz. This might be a bit high for a random wire, probably we should keep this connection short.

Software

Initial code for sampling audio from the digital microphone can be found on github.

What the software should do:

  • Take an audio measurement from the microphone at a regular interval (say 1 second every 10 seconds)
  • On the recorded audio, perform a 4096-point real->complex FFT with a windowing function (flat-top for example).
  • Calculate power for each FFT-bin (Im-squared + Re-squared) and sum up bins per octave.
  • Calculate statistics, e.g. minimum/average/maximum over a 5 minute interval and convert to a logarithmic scale (decibels)
  • Every 5 minutes, send the statistics to the network using WiFi or LoRa

The network receives the raw decibel values and can apply corrections for specific microphones, do A-weighting, etc.