.. include:: <isonum.txt>

.. _series:

********************
Import a Time Series
********************

Purpose of this Chapter
=======================
The aim of this chapter is to explain how to import a time series. Therefore you
will learn how to create entries for the metadata and how to import data points.
Based on it, you will learn how to create relations between series, calibrations
and functions. Note that time series can't be created via the web interface. You
have to use the REST-API or to write an importer for it.

.. figure:: ../../graphics/dataseries_overview.svg
   :width: 100%


Create Data Levels
==================

A data level describes the whole state of the time series and each series has to refer
to a level. A level has a unique *name* and a short *description*. Within the
**IAGOS** projects, there are the three levels *Raw data*, *Preliminary data*, and
*Final data*.

Via Python
----------

In the following code snippet, you will learn how to create an entry for the level *Final data*.

.. code-block:: python

    from IAGOS.apps.database.dataseries import models as dataseries

    level, _ = dataseries.DataLevel.objects.get_or_create(
        id=2, name="Final data", description="Calibrated data"
    )

Via Web Interface
-----------------

1. Make sure that you have the permissions to create new entries *(admin)*
2. Go to the menu *Time Series*  |rarr| *Levels*
3. Create the data level **Final data** (ID: 2) by clicking the button on the top right corner.

Create validities
=================

Validities are used to flag data points. A validity has a unique *name* and a
*description*. Within the **IAGOS** projects, there are the validities
*Good*, *Limited*, *Erroneous*, *Not validated* and *Missing value*.

Via Python
----------

In the following code snippet, the validities *Good*, *Limited*, and *Erroneous* will be created.

.. code-block:: python

    from IAGOS.apps.database.dataseries import models as dataseries

    good_validity, _ = dataseries.DataValidity.objects.get_or_create(
        id=0, name="Good"
    )
    limited_validity, _ = dataseries.DataValidity.objects.get_or_create(
        id=2, name="Limited", description="Doubtful"
    )
    error_validity, _ = dataseries.DataValidity.objects.get_or_create(
        id=3, name="Erroneous", description="Invalid"
    )

Via Web Interface
-----------------

1. Make sure that you have the permissions to create new entries *(admin)*
2. Go to the menu *Time Series*  |rarr| *Validities*
3. Create the data validities **Good** (ID: 0), **Limited** (ID: 2), and **Erroneous** (ID: 3) by
   clicking the button on the top right corner.


Import Series
=============

Generate Sample Data
--------------------

Usually, you read the data from files or other systems. The following
code snippet generate a time series for demonstration purposes.

.. code-block:: python

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd

    mu, sigma = 15e3, 100
    values = np.random.normal(mu, sigma, 400)
    index = pd.date_range(start="1/1/2021", periods=400, freq="min")
    series = pd.Series(values, index=index)
    series.iloc[10:390] -= 145e2
    series.plot()
    plt.grid(True)
    plt.show()

.. figure:: ../../graphics/time_series.svg
   :width: 100%

Import Metadata
---------------

First of all, you have to create the metadata of the time series. Each time series needs
the following attributes *component_parameter*, *deployment*, *data_description*,
*period_start*, *period_end*, *timestamp*, *revision* and *data_level*. The field
*data_description* saves the *name* and the *description* of the series. It's recommended
that series *(with different revision and levels)* which describe the same data share
the same *data_description* and have a unique name. This gives you the possibility
to create a history. If series of several deployments or flights have always the same
name *(e.g., H2O_gas)*, you have to use additional properties *(e.g., time period)* for it.

.. important::
    For performance reasons, each series refers to the related deployment. It
    makes it possible to locate the time series of the same deployment close
    together. See also: https://en.wikipedia.org/wiki/Database_index

.. code-block:: python

    from IAGOS.apps.database.components import models as components
    from IAGOS.apps.database.dataseries import models as dataseries

    data_description, _ = dataseries.DataDescription.objects.get_or_create(
        name="H2O_gas", description="Measured by ICH"
    )
    component_parameter = components.ComponentParameter.objects.get(
        component=instances["ICH"], parameter__name="H2O_gas"
    )

    db_series, _ = dataseries.DataSeries.objects.get_or_create(
        component_parameter=component_parameter,
        deployment=deployment,
        data_description=data_description,
        period_start=series.index[0].to_pydatetime(),
        period_end=series.index[-1].to_pydatetime(),
        timestamp=datetime(2020, 4, 1),
        revision=20200101,
        data_level=level
    )


Import data points
------------------

As already mentioned, each data point has a *timestamp*, a *value* and a *validity*.
The first step is to create a DataFrame that contains theses information for
each data point. After it, you can use the method *import_data_points_from_pandas*
from the model *DataSeries* which imports the data points.

.. code-block:: python

    import numpy as np
    from IAGOS.apps.database.dataseries import models as dataseries

    df = series.to_frame(name="value")
    df["timestamp"] = series.index
    validities = np.zeros(400)
    validities[[42, 142, 242, 342]] = 2
    df["data_validity"] = validities
    df["data_validity"] = dataseries.DataValidity.get_validities(df["data_validity"])
    points = db_series.import_data_points_from_pandas(df)

Import Errors
-------------

Within the IAGOS project, many series have an error value for each data point.
These errors will be stored as **DataPointExtensions**. An extension is always
related to a data point and has the field **values** that allows you to store
additional information in JSON format. The model **DataSeries** provides the
method **import_errors** to import these errors and expect the errors as a
*pd.Series*.

.. code-block:: python

    import numpy as np
    import pandas as pd

    mu, sigma = 5, 2
    errors = pd.Series(np.random.normal(mu, sigma, 400), index=index)
    db_series.import_errors(errors)


Import Flags
------------

Besides the errors, additional flags can be stored as well. Especially when you
perform QA/QC algorithms, it's recommended to use the functionality. You can save
the results of different data points as an extension. This can be helpful to
understand the resulting validities and can be used for further analysis. The
model **DataSeries** provides the method **import_flags** that expects a data
frame. Note that the columns of the data frame will be used to store values. 

.. code-block:: python

    import numpy as np
    import pandas as pd

    mu, sigma = 5, 2
    values = pd.Series(np.random.normal(mu, sigma, 400), index=index)
    flags = np.concatenate([(values < 5), (values < 2)], axis=0).reshape(-1, 2)
    flags = pd.DataFrame(flags, index=index, columns=["flag_a", "flag_b"])
    db_series.import_flags(flags)


Create Relations
================

If you apply a calibration function to the data or use other series to process
the data, it's highly recommended to create a relation for it. The relation helps
you to retrace which series, calibrations, and functions were used for processing.
In the following code snippet, a relation between the calibrations, functions and
the series will be created. 

.. note::
    If you want to refer other series, you have to use the following line for
    it: **relation.related_series.add(other_series)**.
 
.. code-block:: python

    from IAGOS.apps.database.dataseries import models as dataseries

    relation, _ = dataseries.SeriesRelation.objects.get_or_create(series=db_series)
    relation.related_functions.add(function)
    relation.related_calibrations.add(pre_calibration)
    relation.related_calibrations.add(post_calibration)
    relation.save()