Stofradar: Difference between revisions

From RevSpace
Jump to navigation Jump to search
 
(91 intermediate revisions by the same user not shown)
Line 1: Line 1:
   {{Project
   {{Project
   |Name=Stofradar
   |Name=Stofradar
   |Picture=Stofradar3.png
   |Picture=stofradar.png
   |Omschrijving=Visualizing fine dust concentrations on a map
   |Omschrijving=Visualizing airborne particulate matter concentrations on a map
   |Status=Initializing
   |Status=Completed
   |Contact=bertrik
   |Contact=bertrik
   }}
   }}


== Introduction ==
== Introduction ==
This page is about my plan to create a 'stofradar' image of fine dust concentrations based on the raw data measured by the luftdaten.info network.
This page is about creating a 'stofradar' image of atmospheric particulate matter concentrations based on the raw data measured by the sensor.community network,
see [http://www.stofradar.nl www.stofradar.nl].
 
Visualisation of citizen-science data by RIVM in the samenmeten project can be found [https://samenmeten.rivm.nl/animatie/index-both.php here].
 
The focus is on raw visualisation of the source data, only the most minimal attempt is made to "validate" the data.
Sensor measurements and sensor locations are basically uncontrolled, since we cannot tell if a particular sensor is defective or has an unusual position that affects its measurements.


See also my [[DustSensor]] page.
See also my [[DustSensor]] page.


luftdaten.info is an initiative to allow citizens to measure fine dust concentration using an inexpensive and easy to build fine dust sensor.
The website [https://sensor.community sensor.community] is an initiative to allow citizens to participate in measuring atmospheric particulate matter concentration using an inexpensive and [https://sensor.community/nl/sensor-bouwen/ easy to build sensor].
They collect this data, calculate 5 minute and daily averages and publish it again as open data.
They collect this data, calculate 5 minute and daily averages and publish it again as open data.
The total number of sensors is about 5000 worldwide, most of them in Germany, Bulgaria, Belgium, Austria, Sweden.
The total number of sensors is > 12000 worldwide, most of them in Germany, Bulgaria, Belgium, Austria, Sweden.
The Netherlands has about 100 sensors.
The Netherlands has > 2000 sensors. See also [https://stats.sensor.community/].
 
Future activities:
* add a water mark
* add some kind of slider to indicate progress on the GIF image, or even allow the user to slide back and forth
* fix the problem of stale data
 
=== Stale data ===
Sensor.community nodes typically provides data at high time resolution, a few minutes, with a 'heartbeat' update of 5 minutes.
Some data sources provide relatively stale data, data is only updated once per hour and once the application picks it up the data is already over an hour old.
Data sources with such different latencies cannot be combined properly in a single near-realtime image.


I want to create images / animations based on this data.
This is handled by providing with each image a maximum age of the data.
For the Rotterdam visualisation this is a couple of hours (most of the sensor are from the luchtclub project, which has high latency).
For the Netherlands visualisation, this is a maximum of one hour, meaning that data from Rotterdam is not visible on the national map.


== Visualisation ==
== Visualisation ==
The general idea is to create an image, with a map at the background and the fine dust concentration overlaid on top.
The general idea is to create an image, with a map at the background and the atmospheric particulate matter concentration overlaid on top.
 
=== Background map ===
The map background on stofradar.nl is based on https://mapsvg.com/maps/netherlands
 
The map projection used is the '''equirectangular projection''' (EPSG-32662),
so I can easily map a pixel back to a latitude/longitude.


TODO:
=== Data filtering ===
* get a nice background map, I prefer the equirectangular projection since that is easy to calculate. A black-and-white map would be nice so the overlay adds all the colour.
Data is raw PM2.5 data taken from:
* find a way to combine images over time into an animated gif: imagemagick convert?
* sensor.commnunity (5 minute data)
* RIVM samenmeten data portal (60 minute data)
* meetjestad


=== Background map ===
There is only very minimal data filtering. Sensor measurements are taken into account as follows:
Pages to investigate:
* Sensors from an area 2x2 times bigger than the area visualized are considered for visualisation
* https://wiki.openstreetmap.org/wiki/OSM_on_Paper
* Sensors marked as 'indoor' are ignored
* http://maps.stamen.com/m2i/#toner-background/600:800/6/52.200/5.300
* Sensors with a measurement value smaller than 0 are ignored
* The top percent of highest PM2.5 concentrations is discarded, this mostly takes care of outliers caused by defective sensors
* When sensor data is not available in the past 5 minutes, data from a previous measurement interval is used, up to 1 hour old
* A (small) number of sensors that are known to always report a very high value are not considered (blacklisted)


=== Interpolation ===
=== Interpolation ===
Since we only have data at a set of discrete points, the concentration at other points is estimated by combining data from all sensors using
Since we only have data at a set of discrete points, the concentration at other points is estimated by combining data from all sensors using
[https://en.wikipedia.org/wiki/Inverse_distance_weighting inverse distance weighting], in particular using the distance *squared* as the weighing factor in a weighted average.
[https://en.wikipedia.org/wiki/Inverse_distance_weighting inverse distance weighting], in particular using the distance *squared* as the weighing factor in a weighted average.
So a nearby sensor has a large effect and a far away sensor has very little effect, contributing only a little bit to the global average.
So a nearby sensor has a large effect and a far away sensor has a small effect.
 
To calculate the distance, I use a very simple approximation:
* calculate the "middle" of the map (average latitude/longitude between top-left and bottom-right);
* calculate the "km-per-degree-latitude" at the middle for latitude as 40075 km / 360 degrees;
* calculate the "km-per-degree-longitude" at the middle for longitude as the number above multiplied with cos(latitude);
* determine the difference in longitude and the difference in latitude;
* convert both to km using the factors calculated earlier;
* calculate the [https://en.wikipedia.org/wiki/Euclidean_distance euclidean distance].
 
Pixels that are not within a certain distance of any sensor station (e.g. 10 km) are rendered as grayscale, to indicate a geographic limit of each sensor.
 
Only sensors within a reasonable range of the map are taken into account, currently this is an area of 4 times (2x2) the visible area.


To calculate the distance<sup>2</sup>, I use a very simple approximation:
New plan to improve the situation of stations with anomalous data having a large influence:  
* determine the difference in longitude and latitude
* discard obviously wrong sensor values completely, e.g. check PM2.5 against the following rule: 0 <= PM2.5 <= PM10
* apply an 'aspect ratio' factor of cos(latitude) to the longitude difference. I use the latitude of the approximate middle of the netherlands (52.2 lat, 5.3 lon) to calculate this aspect ratio factor.
* apply a plausibility score to each sensor, the basis for this is the plausibility score (0-10) already calculated by samenmeten
* distance<sup>2</sup> = (difference latitude)<sup>2</sup> + (difference longitude)<sup>2</sup>
* for sensors not considered in the samenmeten plausibility score (e.g. using PMS7003 or SPS30 sensors), calculate our own plausiblity
A better way would be to use the  
** in top 1% percentile: score lower
[https://en.wikipedia.org/wiki/Great-circle_distance 'great-circle-distance'] and possibly even account for the fact that the earth is not perfectly spherical, but I like to start simple and this makes the calculation a lot faster.
** more rules to be determined ...
* for every pixel, consider only the stations within a certain radius, say 10km for the netherlands map
* for '''interpolated''' pixels, consider only stations meeting some threshold plausibility value (configurable)
* for pixels directly around a station, override color corrresponding to the station sensor value, even if it is not plausible


=== Colour range ===
=== Colour range ===
The colours I'll probably be using (a kind of spectral range from blue to red):
[[File:luchtmeetnet_lki.png|right|thumb|Luchtmeetnet ranges]]
*  0 ug/m3: fully transparent white (#FFFFFF.00)
 
*  25 ug/m3: 3/4 transparent cyan (#00FFFF.40)
The colours I'm using are based on the scale used for air quality index from luchtmeetnet with data from RIVM,
*  50 ug/m3: 3/4 transparent yellow (#FFFF00.40)
see https://www.luchtmeetnet.nl/informatie/luchtkwaliteit/luchtkwaliteitsindex-(lki)
* 100 ug/m3: 3/4 transparent red (#FF0000.40)
 
* 200 ug/m3 and higher: 3/4 transparent purple (#FF00FF.40)
The input value is the PM2.5 concentration.
With interpolation for RGB and alpha for values between these levels.
 
Values in between these levels are interpolated linearly with respect to the RGB colour value and alpha channel.
 
=== Correction for high humidity ===
The map is currently not corrected for high humidity, however the median of a subset of humidity sensors in the map area is determined and displayed on the image.
Only humidity sensors of type BME280 are considered, they are considered to be of better quality than a DHT11 or DHT22 sensor.
Not all particulate matter measurement stations have a humidity sensor onboard.
 
Humidity generally seems to cause an overestimation of PM measurements for measurements done with a "particle counting" type of PM sensor.
The effect becomes really significant above approximately 70% humidity.


The middle level of 50 ug/m3 has been agreed on as the level that should not be exceeded as the yearly average in the Netherlands.
An interesting idea is to try to compensate for this effect, since the sensor.community sensor has an onboard humidity-sensor.
Some papers/links about this:
* https://www.samenmetenaanluchtkwaliteit.nl/sites/default/files/2018-07/Status_SDS011_12juli18.pdf
* https://www.researchgate.net/publication/320474792_Influence_of_Humidity_on_the_Accuracy_of_Low-Cost_Particulate_Matter_Sensors
* https://github.com/opendata-stuttgart/meta/wiki/Luftfeuchte-Korrektur


This scale is approximately logarithmic, with each step being twice as big as the previous one.
However, I see the following problems with the formulas and coefficient in the opendata-stuttgart link above:
* it combines formulas and coefficients from different sources where relative humidity has different units. One paper seems to use an RH-value from 0 to 100, while another uses a kind of normalized relative humidity (from 0 to 1). You cannot just use the same coefficients if the unit is different.
* it claims a humidity correction for PM10 with coefficients that is not found in the source paper.


=== Compositing ===
=== Animation ===
I plan to use imagemagick for this.
Besides an image with current data from the last 5 minutes, every hour two animations are created:
* GIF animation composed of hourly images over the past 24 hours
* MP4 animation composed of 5-minute images over the past 24 hours


Work in progress:
You can click on the GIF animation to view the MP4 animation.
  composite -compose over -geometry 800x600 20180605_210100.json.png netherlands.png output.png
where netherlands.png is an 600x800 opaque black-and-white image of the map of the netherlands
and 20180605_210100.json.png is an 60x80 image of dust concentrations with an alpha channel


== Software ==
== Software ==
See the [https://github.com/bertrik/luftdatenmapper github page] for the source code.
See the [https://github.com/bertrik/stofradar github page] for the source code.

Latest revision as of 13:53, 13 December 2022

Project Stofradar
Stofradar.png
Visualizing airborne particulate matter concentrations on a map
Status Completed
Contact bertrik
Last Update 2022-12-13

Introduction

This page is about creating a 'stofradar' image of atmospheric particulate matter concentrations based on the raw data measured by the sensor.community network, see www.stofradar.nl.

Visualisation of citizen-science data by RIVM in the samenmeten project can be found here.

The focus is on raw visualisation of the source data, only the most minimal attempt is made to "validate" the data. Sensor measurements and sensor locations are basically uncontrolled, since we cannot tell if a particular sensor is defective or has an unusual position that affects its measurements.

See also my DustSensor page.

The website sensor.community is an initiative to allow citizens to participate in measuring atmospheric particulate matter concentration using an inexpensive and easy to build sensor. They collect this data, calculate 5 minute and daily averages and publish it again as open data. The total number of sensors is > 12000 worldwide, most of them in Germany, Bulgaria, Belgium, Austria, Sweden. The Netherlands has > 2000 sensors. See also [1].

Future activities:

  • add a water mark
  • add some kind of slider to indicate progress on the GIF image, or even allow the user to slide back and forth
  • fix the problem of stale data

Stale data

Sensor.community nodes typically provides data at high time resolution, a few minutes, with a 'heartbeat' update of 5 minutes. Some data sources provide relatively stale data, data is only updated once per hour and once the application picks it up the data is already over an hour old. Data sources with such different latencies cannot be combined properly in a single near-realtime image.

This is handled by providing with each image a maximum age of the data. For the Rotterdam visualisation this is a couple of hours (most of the sensor are from the luchtclub project, which has high latency). For the Netherlands visualisation, this is a maximum of one hour, meaning that data from Rotterdam is not visible on the national map.

Visualisation

The general idea is to create an image, with a map at the background and the atmospheric particulate matter concentration overlaid on top.

Background map

The map background on stofradar.nl is based on https://mapsvg.com/maps/netherlands

The map projection used is the equirectangular projection (EPSG-32662), so I can easily map a pixel back to a latitude/longitude.

Data filtering

Data is raw PM2.5 data taken from:

  • sensor.commnunity (5 minute data)
  • RIVM samenmeten data portal (60 minute data)
  • meetjestad

There is only very minimal data filtering. Sensor measurements are taken into account as follows:

  • Sensors from an area 2x2 times bigger than the area visualized are considered for visualisation
  • Sensors marked as 'indoor' are ignored
  • Sensors with a measurement value smaller than 0 are ignored
  • The top percent of highest PM2.5 concentrations is discarded, this mostly takes care of outliers caused by defective sensors
  • When sensor data is not available in the past 5 minutes, data from a previous measurement interval is used, up to 1 hour old
  • A (small) number of sensors that are known to always report a very high value are not considered (blacklisted)

Interpolation

Since we only have data at a set of discrete points, the concentration at other points is estimated by combining data from all sensors using inverse distance weighting, in particular using the distance *squared* as the weighing factor in a weighted average. So a nearby sensor has a large effect and a far away sensor has a small effect.

To calculate the distance, I use a very simple approximation:

  • calculate the "middle" of the map (average latitude/longitude between top-left and bottom-right);
  • calculate the "km-per-degree-latitude" at the middle for latitude as 40075 km / 360 degrees;
  • calculate the "km-per-degree-longitude" at the middle for longitude as the number above multiplied with cos(latitude);
  • determine the difference in longitude and the difference in latitude;
  • convert both to km using the factors calculated earlier;
  • calculate the euclidean distance.

Pixels that are not within a certain distance of any sensor station (e.g. 10 km) are rendered as grayscale, to indicate a geographic limit of each sensor.

Only sensors within a reasonable range of the map are taken into account, currently this is an area of 4 times (2x2) the visible area.

New plan to improve the situation of stations with anomalous data having a large influence:

  • discard obviously wrong sensor values completely, e.g. check PM2.5 against the following rule: 0 <= PM2.5 <= PM10
  • apply a plausibility score to each sensor, the basis for this is the plausibility score (0-10) already calculated by samenmeten
  • for sensors not considered in the samenmeten plausibility score (e.g. using PMS7003 or SPS30 sensors), calculate our own plausiblity
    • in top 1% percentile: score lower
    • more rules to be determined ...
  • for every pixel, consider only the stations within a certain radius, say 10km for the netherlands map
  • for interpolated pixels, consider only stations meeting some threshold plausibility value (configurable)
  • for pixels directly around a station, override color corrresponding to the station sensor value, even if it is not plausible

Colour range

Luchtmeetnet ranges

The colours I'm using are based on the scale used for air quality index from luchtmeetnet with data from RIVM, see https://www.luchtmeetnet.nl/informatie/luchtkwaliteit/luchtkwaliteitsindex-(lki)

The input value is the PM2.5 concentration.

Values in between these levels are interpolated linearly with respect to the RGB colour value and alpha channel.

Correction for high humidity

The map is currently not corrected for high humidity, however the median of a subset of humidity sensors in the map area is determined and displayed on the image. Only humidity sensors of type BME280 are considered, they are considered to be of better quality than a DHT11 or DHT22 sensor. Not all particulate matter measurement stations have a humidity sensor onboard.

Humidity generally seems to cause an overestimation of PM measurements for measurements done with a "particle counting" type of PM sensor. The effect becomes really significant above approximately 70% humidity.

An interesting idea is to try to compensate for this effect, since the sensor.community sensor has an onboard humidity-sensor. Some papers/links about this:

However, I see the following problems with the formulas and coefficient in the opendata-stuttgart link above:

  • it combines formulas and coefficients from different sources where relative humidity has different units. One paper seems to use an RH-value from 0 to 100, while another uses a kind of normalized relative humidity (from 0 to 1). You cannot just use the same coefficients if the unit is different.
  • it claims a humidity correction for PM10 with coefficients that is not found in the source paper.

Animation

Besides an image with current data from the last 5 minutes, every hour two animations are created:

  • GIF animation composed of hourly images over the past 24 hours
  • MP4 animation composed of 5-minute images over the past 24 hours

You can click on the GIF animation to view the MP4 animation.

Software

See the github page for the source code.