EspAudioSensor: Difference between revisions
No edit summary |
|||
Line 13: | Line 13: | ||
== Measuring audio as a citizen science project == | == Measuring audio as a citizen science project == | ||
The idea is the following: | |||
* | * A small box containing a microphone outside your house measures environmental sounds (traffic, etc), for example it takes a 1 second audio recording every 10 seconds | ||
* | * A spectral analysis is made of the audio (separating it in different frequency bands), calculating sound intensity | ||
* | * Every 5 minutes, the measured sound intensity is sent to a central server on the internet using your home WiFi connection. Because we only communicate intensities, this does not reveal audio fragments (conversations for example). | ||
* The central server takes the measured intensities and can do corrections, like a microphone specific correction, or a correction to apply A-weighting | |||
* The central server visualises the measurements: | |||
* | ** We can plot the sound intensities on a map as a coloured dot and get a nice overview how the map changes during the day/week/year | ||
* | ** We can plot the sound intensities of individual nodes vs time, and get an idea how sound varies from day/night/week/year | ||
** | |||
External things to investigate: | External things to investigate: | ||
Line 37: | Line 28: | ||
== Theory == | == Theory == | ||
The plan is to divide the audio spectrum up into octaves and calculate the total energy in each octave. | The plan is to divide the audio spectrum up into octaves and calculate the total energy in each octave. | ||
To this we can then easily apply sensor/housing specific corrections, do A weighting, etc. | To this we can then easily apply sensor/housing specific corrections, do A weighting, etc. | ||
This division of the audio spectrum is chosen so it matches the octaves used in the well-known A-weighting curve. | |||
An octave is basically a factor of two in frequency. | An octave is basically a factor of two in frequency. | ||
Line 47: | Line 39: | ||
=== A-weighting === | === A-weighting === | ||
Audio levels are generally measured on a logarithmic scale in dB using "A-weighting". | Audio levels are generally measured on a logarithmic scale in dB using "A-weighting". | ||
A-weighting calculates a subjective loudness level from a physical loudness. | A-weighting calculates a subjective loudness level from a physical loudness, taking into account the spectrum of the audio. | ||
Energy | Energy is calculated per octave and a correction factor for that specific octave is applied. | ||
The octaves are defined referenced to 1000 Hz, so submultiples like 500, 250, 125, 62 Hz and multiples like 2000, 4000, 8000, 16000 Hz. | The octaves are defined referenced to 1000 Hz, so submultiples like 500, 250, 125, 62 Hz and multiples like 2000, 4000, 8000, 16000 Hz. | ||
The correction mains that for each octave, a number is added or subtracted from the raw decibel value. | The correction mains that for each octave, a number is added or subtracted from the raw decibel value. | ||
Line 58: | Line 50: | ||
== Hardware == | == Hardware == | ||
The physical device consists of: | The physical device consists of: | ||
* an ESP32 (or possibly an ESP8266), it has an I2S digital audio input | * an ESP32 (or possibly an ESP8266), it has an I2S digital audio input for sampling data from a digital microphone and a WiFi interface to communicate things to the internet | ||
* a digital I2S microphone, like the INMP441 ([https://www.invensense.com/wp-content/uploads/2015/02/INMP441.pdf datasheet)] | * a digital I2S microphone, like the INMP441 ([https://www.invensense.com/wp-content/uploads/2015/02/INMP441.pdf datasheet)] | ||
The microphone is connected as follows: | Waag society uses the following microphone in their [https://waag.org/en/article/new-version-smart-citizen-kit-available kit 2.1]: Invensense ICS4342. | ||
I ordered these [https://aliexpress.com/item/INMP441/32960945048.html INMP441 microphones] from Aliexpress. | |||
The microphone is connected to the microcontroller as follows: | |||
* INMP441 GND to ESP32 GND | * INMP441 GND to ESP32 GND | ||
* INMP441 VDD to ESP32 3.3V | * INMP441 VDD to ESP32 3.3V | ||
Line 68: | Line 63: | ||
* INMP441 WS to ESP32 15 | * INMP441 WS to ESP32 15 | ||
* INMP441 L/R to ESP32 GND | * INMP441 L/R to ESP32 GND | ||
The connection carries only digital signals (max 3 MHz or so). | |||
No sensitive analog electronics are needed, the microphone and the microcontroller are simply connected using "dupont" wire. | |||
== Software == | == Software == | ||
Line 75: | Line 73: | ||
* Take audio measurement from the microphone at a regular interval (say 1 second every 10 seconds) | * Take audio measurement from the microphone at a regular interval (say 1 second every 10 seconds) | ||
* On the recorded audio, perform a 4096-point FFT with a windowing function (Gaussian for example). This results in 2048 FFT coefficients. | * On the recorded audio, perform a 4096-point FFT with a windowing function (Gaussian for example). This results in 2048 FFT coefficients. | ||
* Sum up FFT into octave | * Sum up FFT into power per octave, e.g. top 1024 coefficients represent octave of 11025-22050 Hz, next 512 coefficients represent is 5512-11025 Hz octave, etc. | ||
* Calculate statistics, e.g. minimum/average/maximum as decibels in the current 5 minute interval | * Calculate statistics, e.g. minimum/average/maximum as decibels in the current 5 minute interval | ||
* Every 5 minutes, send the statistics to the network using WiFi | * Every 5 minutes, send the statistics to the network using WiFi | ||
The network receives the raw decibel values and can apply corrections for specific microphones, do A-weighting, etc. | The network receives the raw decibel values and can apply corrections for specific microphones, do A-weighting, etc. |
Revision as of 11:33, 15 April 2019
Project ESP audio sensor | |
---|---|
File:NoPicture.jpg | |
ESP-based audio sensor | |
Status | Initializing |
Contact | bertrik |
Last Update | 2019-04-15 |
Introduction
This project is about using an ESP-32 together with an I2S digital microphone to create an audio sensor.
This could be a decibel meter, or perhaps an environmental noise meter.
Measuring audio as a citizen science project
The idea is the following:
- A small box containing a microphone outside your house measures environmental sounds (traffic, etc), for example it takes a 1 second audio recording every 10 seconds
- A spectral analysis is made of the audio (separating it in different frequency bands), calculating sound intensity
- Every 5 minutes, the measured sound intensity is sent to a central server on the internet using your home WiFi connection. Because we only communicate intensities, this does not reveal audio fragments (conversations for example).
- The central server takes the measured intensities and can do corrections, like a microphone specific correction, or a correction to apply A-weighting
- The central server visualises the measurements:
- We can plot the sound intensities on a map as a coloured dot and get a nice overview how the map changes during the day/week/year
- We can plot the sound intensities of individual nodes vs time, and get an idea how sound varies from day/night/week/year
External things to investigate:
- https://www.rijksoverheid.nl/onderwerpen/geluidsoverlast/geluidsoverlast-in-de-wet
- INMP441 microphone datasheet
- https://github.com/maspetsberger/esp32-i2s-mems
Theory
The plan is to divide the audio spectrum up into octaves and calculate the total energy in each octave. To this we can then easily apply sensor/housing specific corrections, do A weighting, etc.
This division of the audio spectrum is chosen so it matches the octaves used in the well-known A-weighting curve.
An octave is basically a factor of two in frequency. Coincidentally an FFT also calculates things in factors of two, for example when sampling at 44100 Hz, you get the following octave boundaries: 22050 Hz, 11025 Hz, 5512 Hz, 2756 Hz, 1378 Hz, 689 Hz, 244 Hz, 172 Hz, 86 Hz, 43 Hz, 22 Hz.
A-weighting
Audio levels are generally measured on a logarithmic scale in dB using "A-weighting". A-weighting calculates a subjective loudness level from a physical loudness, taking into account the spectrum of the audio.
Energy is calculated per octave and a correction factor for that specific octave is applied. The octaves are defined referenced to 1000 Hz, so submultiples like 500, 250, 125, 62 Hz and multiples like 2000, 4000, 8000, 16000 Hz. The correction mains that for each octave, a number is added or subtracted from the raw decibel value.
Links:
- Overview of octave bands https://www.engineeringtoolbox.com/octave-bands-frequency-limits-d_1602.html
Hardware
The physical device consists of:
- an ESP32 (or possibly an ESP8266), it has an I2S digital audio input for sampling data from a digital microphone and a WiFi interface to communicate things to the internet
- a digital I2S microphone, like the INMP441 (datasheet)
Waag society uses the following microphone in their kit 2.1: Invensense ICS4342. I ordered these INMP441 microphones from Aliexpress.
The microphone is connected to the microcontroller as follows:
- INMP441 GND to ESP32 GND
- INMP441 VDD to ESP32 3.3V
- INMP441 SD to ESP32 A4/32
- INMP441 SCK to ESP32 A16/14
- INMP441 WS to ESP32 15
- INMP441 L/R to ESP32 GND
The connection carries only digital signals (max 3 MHz or so). No sensitive analog electronics are needed, the microphone and the microcontroller are simply connected using "dupont" wire.
Software
Initial code can be found on github.
What the software should do:
- Take audio measurement from the microphone at a regular interval (say 1 second every 10 seconds)
- On the recorded audio, perform a 4096-point FFT with a windowing function (Gaussian for example). This results in 2048 FFT coefficients.
- Sum up FFT into power per octave, e.g. top 1024 coefficients represent octave of 11025-22050 Hz, next 512 coefficients represent is 5512-11025 Hz octave, etc.
- Calculate statistics, e.g. minimum/average/maximum as decibels in the current 5 minute interval
- Every 5 minutes, send the statistics to the network using WiFi
The network receives the raw decibel values and can apply corrections for specific microphones, do A-weighting, etc.