Archivo

Posts Tagged ‘pollution’

Analysis: Modeling Air Pollution in the city of Santander (Spain)

We have published a new study entitled “Modeling Air Pollution in the City of Santander (Spain)“, carried out in the context of the project Ciudad2020. In this new document – in a similar way to what we did in our study on noise pollution-, we have focused on presenting the full analysis of real application in the modeling of air pollution in the city of Santander (Spain), which had already been summarily described in our whitepaper on pollution predictive modeling techniques in the sustainable city.

One of the objectives of Ciudad2020 as far as pollution in concerned is to install across the city a wide network of low-cost sensors (with respect to the current model, made of few very expensive and accurate measuring stations). However, at present, the mentioned low-cost sensor network has not been deployed in any city yet, and checking the validity of this model requires data about various pollutants related to an urban center.

cimaThe data used in this analysis are historical data provided by the Environmental Research Centre (CIMA).This entity is an autonomous body of the Government of Cantabria created by law in 1991 and headed by the Ministry of Environment. Its activity is centered on the realization of physico-chemical analyses on the state of the environment and the management of sustainability through Environmental Information, Participation, Education and Environmental Volunteering.

The data set consists of measures taken every 15 minutes between 1/1/2011 and 31/1/2013 by 4 automatic measuring stations of the Air Quality Control and Monitoring Network of Cantabria, which are located in the surroundings of Santander. The values associated to pollutants are the following: PM10 (particles in suspension of size less than 10 microns), SO2 (sulphur dioxide), NO and NO2 (nitrogen oxides), CO (carbon monoxide), O3 (ozone), BEN (benzene), TOL (toluene) and XIL (xylene). In addition, those stations that have a meteorological tower measure the following meteorological parameters: DD (wind direction), VV (wind speed), TMP (temperature), HR (relative humidity), PRB (atmospheric pressure), RS (solar radiation) and LL (precipitation level).

As described in the document, the first step in any modeling study consists in the analysis of data, performed variable by variable and from each measuring station. At least a study of the basic statistics by season (average and standard deviation, median, mode), the distribution of values (histogram) both at global and monthly level and the hourly distribution are requested. The moving average is also analyzed, a statistical feature applicable to the analysis of tendencies which smoothes the fluctuations typical of instant measurements and captures the trends in a given period.

estaciones-cantabria

The next step is to analyze how the variables depend on the others, in order to select the set of variables that most governs the behavior of the output variable. For that purpose correlation analysis has been employed, which is a statistical tool that allows measuring and describing the degree or intensity of association between two variables. In particular, Pearson’s correlation coefficient has been used, which measures the linear relationship between two random quantitative variables X and Y.

Analyses of dependencies have been carried out at the same moment of time, in moments of the past, with differentiated values (difference between the concentration level registered for a contaminant in a given moment of time and the level of 30 minutes before, aiming at detecting trends over time regardless of absolute values) and the moving average value of such contaminant considering different time intervals.

The next step is to evaluate a series of algorithms of modeling with monitored learning (prediction, classification) or not monitored (grouping) to draw conclusions about the behavior of pollution variables. The prediction analysis has been focused on Santander’s center, with 1-hour, 2-hour, 4-hour, 8-hour and 24-hour prediction horizons. Then, the models for each pollution variable in all those horizons have been trained and evaluated. Different machine learning algorithms have been trained in each case (variable-prediction horizon combination): M5P, IBk, Multilayer Perceptron, linear regression, Regression by Discretization, RepTree, Bagging with RepTree, etc. The assessment is performed by comparing the mean absolute error of all different prediction methods.

pollution

For example, when studying the 8-hour prediction, it can be noticed that the hour of the day becomes more important, since citizens behave cyclically and probably what happens at 7 a.m. (e.g. people go to work) relates to what happens at 3 p.m. (e.g. people come back from work).

The last step of the data mining process according to the CRISP-DM methodology would be the implementation in a system of environmental management for obtaining real-time predictions on the different values of pollutants. This implementation has to consider logically the results and conclusions obtained in the analysis and modeling processes at the time of setting up the deployment and prioritizing possible investments.

The most important thing to emphasize is that the analysis illustrates and details the steps to follow in a project of environmental pollution modeling using data mining, although, logically, the analysis and the concrete conclusions only apply, in general, to the city of Santander. You can access the complete study, more information and demos on our website: http://www.daedalus.es/ciudad2020/. If you have any questions or comments, please do not hesitate to contact us, we will be happy to assist you.

[Translation by Luca de Filippis]

Whitepaper: “Pollution Predictive Modeling in the Sustainable City”

Recently we have published the whitepaper “Pollution Predictive Modeling in the Sustainable City“, which describes in detail the approach and methodology that we have adopted within the framework of the Ciudad2020 project to perform predictive modeling of environmental pollution levels in the city of the future. Given that the starting point of the analysis is made up of the immense volume of data collected by the network of sensors deployed around the city, both physical sensors and the citizen sensor, this modeling is addressed as a data mining project (data analytics). Therefore, the methodology, techniques and algorithms typical of data mining have been used to process and exploit the information.

crispdmThe term KDD (Knowledge Discovery in Databases) was coined to refer to the (broad) concept of finding knowledge in data and to emphasize the high level application of certain data mining processes. In an attempt at normalizing this process of knowledge discovering, similarly to what it is done in software engineering for standardizing software development, two main methodologies were taken into account: SEMMA and CRISP-DM. Both fix the tasks to perform in each phase described by KDD, assigning specific tasks and defining the expected outcome for each phase. In (Azevedo, A. and Santos, M. F. KDD, SEMMA and CRISP-DM: a parallel overview. In Proceedings of the IADIS European Conference on Data Mining 2008, pp 182-185.), both implementations are compared and the conclusion is that, although you can draw a parallel between them, CRISP-DM is more complete. In fact, it takes into account also the application of outcomes to the business environment, and, for this reason, it has been adopted for modeling in Ciudad2020.

By collecting different documentary references, the whitepaper presents a detailed description of the CRISP-DM methodology, its objectives, essential phases and tasks. Then, it focuses on describing thoroughly the two application scenarios that have been considered in Ciudad2020 and the pollution modeling process carried out following this methodology: air pollution prediction in the city of Santander (Cantabria, Spain) and noise pollution prediction in the city of Madrid (Spain).

SERENA project (Spanish acronym for Neural Network Statistical Prediction System for Madrid’s Air Quality)

You can find the whitepaper, further information, more documentation and demos on our web page: http://www.daedalus.es/ciudad2020/. If you have any questions or comments, please do not hesitate to contact us, we will be happy to assist you.

[Translation by Luca de Filippis]

Analysis: Modeling Noise Pollution in the City of Madrid (Spain)

29 noviembre, 2013 3 comentarios

You can now access our analysis about “Modeling Noise Pollution in the City of Madrid (Spain)“. This study was carried out in the context of the Ciudad2020 project as part of our work about pollution predictive modeling in the city of the future, an essential component for the integrated environmental information management system. This document contains the complete study of the second scenario summarized in our whitepaper on the techniques of pollution predictive modeling in the sustainable city [only in Spanish].

Madrid is a noisy city. While the noise is not considered a pollutant as obvious as air pollution by ozone and particles, health risk due to noise in the medium term is far superior to the risk given by the other type of pollution. EU requires Member States to set zonal quality objectives as far as noise is concerned. In Spain limit values are 65 dB during the day and 55 dB at night, although the WHO recommends more stringent values.

Despite the existence of action plans against noise (which can be consulted on the website of the Spanish Acoustic Pollution Information System, SICA), the situation needs to improve further. It is necessary to implement more measures on mobility and work on the control of night-time leisure. In summer 2010, the excessive noise of bars in some streets of Madrid, where overnight it can exceed the allowable levels up to 20 dB, even compelled Madrid’s City Council to close bars an hour earlier. That area of Madrid was declared a Special Acoustic Protection Zone, where they apply a program of specific measures to reduce noise. In addition, in a large city like Madrid, not only residents and their leisure habits are noisy, but also permanent works or heavy traffic emit noise pollution: problems undoubtedly difficult to solve.

ruidomadrid

In Madrid’s 2006 noise maps issued by the city’s Environment and Mobility Office, it was noted that a significant percentage of the population is exposed to values higher than the quality objectives set out in the regulations: approximately, during the day a 5.7% of the population was exposed to higher than 65 decibel noise, while in the evening this percentage amounted to the 20.2% of people exposed to more than 55 decibels. Madrid’s 2011 strategic map of noise recently posted reflects advances: from 5.6% to 4.1% of the population exposed during the day and from 20.2% to 14.9% at night, although there is still much work ahead (more information).

In our study we present a real and full analysis of noise pollution in the city of Madrid, using historical data from 2012 provided by the Department of Acoustic Control, headed by Madrid’s Environment and Mobility Office. The provided dataset consists of periodic measures, from 1/January/2012 to 31/December/2012, gathered by the 28 automatic measuring stations of the Air Quality Surveillance Network of Madrid’s City Council.

An extensive analysis shows how values evolve over time depending on the areas, and proposes prediction models using data mining techniques and the methodology proposed for pollution modeling in Ciudad2020. With these models it is possible to obtain a short-term prediction (24 hours) with which you could tell when the noise exceeds the limits established by law, and propose measures to mitigate the effects that these situations can have on citizens (headaches, dizziness, anxiety and fatigue, nervousness, stress…).

calairemadrid

Although the analysis focuses only on this city and the results are applicable exclusively to it, the most remarkable aspect is that the study thoroughly illustrates the steps to follow in general for pollution modeling in any location. You can access the complete study, more information and demos on our website: http://www.daedalus.es/ciudad2020/. If you have any questions or comments, please do not hesitate to contact us, we will be happy to assist you.

[Translation by Luca de Filippis]

A %d blogueros les gusta esto: