Import a Time Series

Purpose of this Chapter

The aim of this chapter is to explain how to import a time series. Therefore you will learn how to create entries for the metadata and how to import data points. Based on it, you will learn how to create relations between series, calibrations and functions. Note that time series can’t be created via the web interface. You have to use the REST-API or to write an importer for it.

../../_images/dataseries_overview.svg

Create Data Levels

A data level describes the whole state of the time series and each series has to refer to a level. A level has a unique name and a short description. Within the IAGOS projects, there are the three levels Raw data, Preliminary data, and Final data.

Via Python

In the following code snippet, you will learn how to create an entry for the level Final data.

from IAGOS.apps.database.dataseries import models as dataseries

level, _ = dataseries.DataLevel.objects.get_or_create(
    id=2, name="Final data", description="Calibrated data"
)

Via Web Interface

  1. Make sure that you have the permissions to create new entries (admin)

  2. Go to the menu Time SeriesLevels

  3. Create the data level Final data (ID: 2) by clicking the button on the top right corner.

Create validities

Validities are used to flag data points. A validity has a unique name and a description. Within the IAGOS projects, there are the validities Good, Limited, Erroneous, Not validated and Missing value.

Via Python

In the following code snippet, the validities Good, Limited, and Erroneous will be created.

from IAGOS.apps.database.dataseries import models as dataseries

good_validity, _ = dataseries.DataValidity.objects.get_or_create(
    id=0, name="Good"
)
limited_validity, _ = dataseries.DataValidity.objects.get_or_create(
    id=2, name="Limited", description="Doubtful"
)
error_validity, _ = dataseries.DataValidity.objects.get_or_create(
    id=3, name="Erroneous", description="Invalid"
)

Via Web Interface

  1. Make sure that you have the permissions to create new entries (admin)

  2. Go to the menu Time SeriesValidities

  3. Create the data validities Good (ID: 0), Limited (ID: 2), and Erroneous (ID: 3) by clicking the button on the top right corner.

Import Series

Generate Sample Data

Usually, you read the data from files or other systems. The following code snippet generate a time series for demonstration purposes.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

mu, sigma = 15e3, 100
values = np.random.normal(mu, sigma, 400)
index = pd.date_range(start="1/1/2021", periods=400, freq="min")
series = pd.Series(values, index=index)
series.iloc[10:390] -= 145e2
series.plot()
plt.grid(True)
plt.show()
../../_images/time_series.svg

Import Metadata

First of all, you have to create the metadata of the time series. Each time series needs the following attributes component_parameter, deployment, data_description, period_start, period_end, timestamp, revision and data_level. The field data_description saves the name and the description of the series. It’s recommended that series (with different revision and levels) which describe the same data share the same data_description and have a unique name. This gives you the possibility to create a history. If series of several deployments or flights have always the same name (e.g., H2O_gas), you have to use additional properties (e.g., time period) for it.

Important

For performance reasons, each series refers to the related deployment. It makes it possible to locate the time series of the same deployment close together. See also: https://en.wikipedia.org/wiki/Database_index

from IAGOS.apps.database.components import models as components
from IAGOS.apps.database.dataseries import models as dataseries

data_description, _ = dataseries.DataDescription.objects.get_or_create(
    name="H2O_gas", description="Measured by ICH"
)
component_parameter = components.ComponentParameter.objects.get(
    component=instances["ICH"], parameter__name="H2O_gas"
)

db_series, _ = dataseries.DataSeries.objects.get_or_create(
    component_parameter=component_parameter,
    deployment=deployment,
    data_description=data_description,
    period_start=series.index[0].to_pydatetime(),
    period_end=series.index[-1].to_pydatetime(),
    timestamp=datetime(2020, 4, 1),
    revision=20200101,
    data_level=level
)

Import data points

As already mentioned, each data point has a timestamp, a value and a validity. The first step is to create a DataFrame that contains theses information for each data point. After it, you can use the method import_data_points_from_pandas from the model DataSeries which imports the data points.

import numpy as np
from IAGOS.apps.database.dataseries import models as dataseries

df = series.to_frame(name="value")
df["timestamp"] = series.index
validities = np.zeros(400)
validities[[42, 142, 242, 342]] = 2
df["data_validity"] = validities
df["data_validity"] = dataseries.DataValidity.get_validities(df["data_validity"])
points = db_series.import_data_points_from_pandas(df)

Import Errors

Within the IAGOS project, many series have an error value for each data point. These errors will be stored as DataPointExtensions. An extension is always related to a data point and has the field values that allows you to store additional information in JSON format. The model DataSeries provides the method import_errors to import these errors and expect the errors as a pd.Series.

import numpy as np
import pandas as pd

mu, sigma = 5, 2
errors = pd.Series(np.random.normal(mu, sigma, 400), index=index)
db_series.import_errors(errors)

Import Flags

Besides the errors, additional flags can be stored as well. Especially when you perform QA/QC algorithms, it’s recommended to use the functionality. You can save the results of different data points as an extension. This can be helpful to understand the resulting validities and can be used for further analysis. The model DataSeries provides the method import_flags that expects a data frame. Note that the columns of the data frame will be used to store values.

import numpy as np
import pandas as pd

mu, sigma = 5, 2
values = pd.Series(np.random.normal(mu, sigma, 400), index=index)
flags = np.concatenate([(values < 5), (values < 2)], axis=0).reshape(-1, 2)
flags = pd.DataFrame(flags, index=index, columns=["flag_a", "flag_b"])
db_series.import_flags(flags)

Create Relations

If you apply a calibration function to the data or use other series to process the data, it’s highly recommended to create a relation for it. The relation helps you to retrace which series, calibrations, and functions were used for processing. In the following code snippet, a relation between the calibrations, functions and the series will be created.

Note

If you want to refer other series, you have to use the following line for it: relation.related_series.add(other_series).

from IAGOS.apps.database.dataseries import models as dataseries

relation, _ = dataseries.SeriesRelation.objects.get_or_create(series=db_series)
relation.related_functions.add(function)
relation.related_calibrations.add(pre_calibration)
relation.related_calibrations.add(post_calibration)
relation.save()