System Overview

System Overview

The European Research Infrastructure IAGOS (In-service Aircraft for a Global Observing System; www.IAOGS.org) is operating a global-scale observing system for atmospheric composition and essential climate variables by deploying automated instruments on passenger aircraft during their commercial flights. To handle the immense data flow from the fleet of aircraft collecting data, IAGOS has implemented an automatic workflow for data management organizing the dataflow starting at the sensor towards the central data-portal located in Toulouse, France. The workflow is realized and documented using the web-based Django framework with a model-based approach using Python.

_images/system_overview.svg

A permanent active cronjob called Task Manager (outer box) activates an individual task instance (dotted box) of a task class describing the complete data handling process. By the following steps: 1) The Transfer Handler checks for new data of the task specific data type by using a RESTful API (Application Programming Interface) operated at the datacenter in Toulouse. The API handles the necessary authentication by using an individual ssh256 encrypted token generated with a pre shared passphrase and unique token (timestamp). If new data is transferred the data is passed to the Import Manager (2). The Import Manager reads and parses the raw files (pandas toolset) and processes the raw data to meaningful values. At the end the Import Manager stores the processed time series to the instrument database for further processing. As the next step (3) the advanced QA/QC Handler performs checks, flags the data, and produces a report for the PI who has to release the data for Level 1 and Level 2.

_images/tasks.svg

Database

The database stores all information about instruments, calibrations, measurements and projects. The flexible design allows you to map your instrument to the database structure. Therefore, you have to define single components and set them in relation. After you created your instrument, you can store calibrations with individual calibration parameters for the instrument. Moreover, you can deploy your instrument into an aircraft or at a station.

See also: database_overview

Workflow

The PI has the opportunity to define jobs that should be executed periodically at a fixed time. Therefore the PI has to set the frequency and the interval between the executions. At the momemnt, the system supports hourly, daily, and weekly jobs. Each job has several TaskManagers. A manager is dedicated to handling one kind of task (e.g., metadata handling). During the execution of a job, the Task Manager will check for new data and store the sources in a queue. After the new data is detected, the tasks will be executed by the managers sequently.

See also: workflow_overview

QA/QC Tool

During measurements, several errors can affect the data. These errors often happen due to instrumentation malfunctions and will affect the data’s statistical properties. To detect the errors, we implemented a framework to perform QA/QC tests. Autom8QC provides tests to improve the data quality. You can use tests alone or combine them into a more complex structure. The framework is modular and is integrated into the Data Portal.

See also: https://autom8qc.readthedocs.io/en/latest/index.html

REST-API

A RESTful API is an architectural style for an Application Program Interface (API) that uses HTTP requests to access and use data. That data can be used to GET, PUT, POST and DELETE data types, which refers to the reading, updating, creating and deleting of operations concerning resources. REST technology is generally preferred over other similar technologies. This tends to be the case because REST uses less bandwidth, making it more suitable for efficient internet usage. RESTful APIs can also be built with programming languages such as JavaScript or Python.

See also: https://searchapparchitecture.techtarget.com/definition/RESTful-API

JupyterHub

The Jupyter Notebook App is a server-client application that allows editing and running notebook documents via a web browser. The Jupyter Notebook App can be executed on a local desktop requiring no internet access (as described in this document) or can be installed on a remote server and accessed through the internet. The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations an narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

See also: https://jupyter.org/