IAGOS.apps.workflow.imports

Overview

../_images/import.svg

The ImportManager combines the components Reader and Importer. The Reader reads the data and parses it to a standardized format that the Importer can interpret. The Importer imports and processes the parsed data. Since the importer expects the standardized format, it doesn’t have to know the format. This allows you to implement different Readers for different formats and to use the same importer. If you import processed data or final data, you can ignore the data processing part. Within the IAGOS project, the raw data will be read and then processed. The data processing is a part of the Importer because in some use cases the data needs to be synchronized. It’s recommended to synchronize the data first and then to import only the synchronized data.

Interfaces

Interface - Reader

Every reader has to inherit from the abstract base class BaseReader. This approach ensures that new readers can be integrated into the workflow without modifying the existing code. Each reader has to provide its metadata (NAME, DESCRIPTION) and has to implement the abstract methods read and read_list. These methods read and parse the data to a standardized format which can be interpreted by the importer.

class IAGOS.apps.workflow.imports.readers.base.BaseReader

Every reader has to inherit from this abstract base class. This approach ensures that new readers can be integrated into the workflow without modifying the existing code. Each reader has to provide its metadata (NAME, DESCRIPTION) and has to implement the abstract methods read and read_list. These methods read and parse the data to a standardized format which can be interpreted by the importer. Moreover, each reader has to provide an additional name (e.g., filename, flight-id, etc).

Warning

If you add additional attributes in the inherited classes, make sure that you reset them in the method read. Moreover, ensure that call the super-constructor.

Parameters
  • NAME (string) – Name of the reader

  • DESCRIPTION (string) – Description of the reader

check_description()

Checks if the DESCRIPTION is set and contains at least 1 character. If the DESCRIPTION is not valid, the method will raise a ValueError or a TypeError.

Raises
  • ValueError – If DESCRIPTION is not set or empty

  • TypeError – If DESCRIPTION is not a string

Returns

None

Return type

None

check_metadata()

Checks if the metadata is correct.

Raises
  • ValueError – If NAME is not set or empty

  • TypeError – If NAME is not a string

  • ValueError – If DESCRIPTION is not set or empty

  • TypeError – If DESCRIPTION is not a string

Returns

None

Return type

None

check_name()

Checks if the NAME is set and contains at least 1 character. If the NAME is not valid, the method will raise a ValueError or a TypeError.

Raises
  • ValueError – If NAME is not set or empty

  • TypeError – If NAME is not a string

Returns

None

Return type

None

get_source()

Returns the source of an instance.

Returns

Source of the data (e.g., filename)

Return type

string

abstract get_source_name()

Returns an additional name (e.g., filename of the dataset) which describes the source. This method is used by the SeriesImporter to add an addition to the series name to avoid that the every series of the same parameter has the same name.

Returns

Name

Return type

string

abstract read(source)

Reads the data and parse it to a standarized structure.

Parameters

source (string) – Source of the data

Returns

None

Return type

None

abstract static read_list(source, additional_source, additional_info)

Parses the given source and prepares it for the TransferHandler. The method is needed since the TransferHandler can’t interpret the responses from external servers. For example, if the TransferHandler checks for new flights (e.g., http://example.com/all-flights/?timestamp=2020-01-01), it will receive a list with the flights that were performed after the given timestamp. These flight IDs will be used to create the requests for the server (e.g., http://example.com/flight/data/?param={id}). In that case, the method will return a list with all created requests for each flight.

Important

Some TransferTypes don’t need this functionality (e.g., DirectoryTransfer). In that case, the method returns just the source.

Parameters
  • source (string) – Source of the data

  • additional_source (string) – Additional source to create requests

  • additional_info (string) – Additional information for parsing

Returns

Sources

Return type

List(string)

Interface - Importer

In this module, the base classes for all importers are implemented. Every importer has to be inherited from one of these classes. With this approach, you can implement new importer without changing the existing code. Each importer has to provide its metadata. Furthermore, every importer has to implement the abstract methods run and rerun. These methods are used to run/rerun the import process.

class IAGOS.apps.workflow.imports.importer.base.BaseImporter

Every importer has to inherit from this abstract base class. This approach ensures that new importers can be integrated into the workflow without modifying the existing code. Each importer has to provide its metadata (NAME, DESCRIPTION) and has to implement the abstract methods run and rerun.

Warning

If you add additional attributes in the inherited classes, make sure that you reset them in the method run. Moreover, ensure that call the super-constructor.

Parameters
  • NAME (string) – Name of the importer

  • DESCRIPTION (string) – Description of the importer

  • error_status (Status) – Error status that describes why the import failed

  • process_status (List(ProcessStatus)) – All process status of the import

  • imported_metadata (List(Metadata)) – All imported metadata

  • imported_series (List(DataSeries)) – All imported series

  • evaluation_method (EvaluationMethod) – Evaluation method

  • level (integer) – Level of the data

check_description()

Checks if the DESCRIPTION is set and contains at least 1 character. If the DESCRIPTION is not valid, the method will raise a ValueError or a TypeError.

Raises
  • ValueError – If DESCRIPTION is not set or empty

  • TypeError – If DESCRIPTION is not a string

Returns

None

Return type

None

check_metadata()

Checks if the metadata is correct.

Raises
  • ValueError – If NAME is not set or empty

  • TypeError – If NAME is not a string

  • ValueError – If DESCRIPTION is not set or empty

  • TypeError – If DESCRIPTION is not a string

Returns

None

Return type

None

check_name()

Checks if the NAME is set and contains at least 1 character. If the NAME is not valid, the method will raise a ValueError or a TypeError.

Raises
  • ValueError – If NAME is not set or empty

  • TypeError – If NAME is not a string

Returns

None

Return type

None

get_error_status()

The method returns the error status of the import. If there is no error, None will be returned. This method will be used by the task manager to inform the user that an error had occurred.

Important

Note that the error status describes that the whole export failed. Don’t use it as a warning.

Returns

Status that describes why the import failed.

Return type

Status

get_evaluation_method()

Returns the evaluation method.

Returns

Evaluation method

Return type

EvaluationMethod

get_imported_metadata()

Returns the imported metadata.

Returns

Imported metadata

Return type

List(ProcessStatus)

get_imported_series()

Returns all imported series.

Returns

Imported series

Return type

List(ProcessStatus)

get_level()

This method returns the level of the data. Note that the method is used to detect whether the PI should be informed. If the level is smaller than the task manager’s configured level, the PI will not be notified. That allows exporting NRT series without checking the data.

Returns

Returns the level of the data

Return type

integer

get_process_status()

Returns all process status of the import.

Returns

All process status of the import

Return type

List(ProcessStatus)

abstract rerun(task, reader, parameters, flags, addition)

Reruns the import process.

Important

If your Importer process data, make sure that a new task will be generated during the rerun process. It’s important that the existing series won’t be deleted since the data wouldn’t be constructable anymore.

Parameters
  • task (Task) – Related task

  • reader (BaseReader) – Reader that already parsed the data to needed structure

  • parameters (models.ManyToManyField) – Parameters that should be imported

  • flags (models.ManyToManyField) – Relations between the flags and the parameters

  • addition (dict) – Additional parameters for the import.

Returns

None

Return type

None

reset_parameters()

Reset the parameters of the importer.

Returns

None

Return type

None

abstract run(reader, parameters, flags, addition)

Runs the import process. The reader provides the data in an interpretable structure. With the parameter parameters, you can define which parameters should be imported. This is necessary because some datasets offer more parameters as needed. If you import series that are already flagged, you can use the parameter flags to define the relationship between the flags and the parameters. Some importer needs additional parameters for the import process. Therefore you can use the parameter addition.

Parameters
  • reader (BaseReader) – Reader that already parsed the data to needed structure

  • parameters (models.ManyToManyField) – Parameters that should be imported

  • flags (models.ManyToManyField) – Relations between the flags and the parameters

  • addition (dict) – Additional parameters for the import.

Returns

None

Return type

None

class IAGOS.apps.workflow.imports.importer.base.SeriesImporter

This class implements an abstract base class for a series importer. An instance of the class has the same properties like an instance of the class BaseImporter. In addition, the class provides methods to import time series.

Warning

Make sure that the following attributes are set:

  • additional_name: Additional name (default: empty)

  • SERIES_DESCRIPTION: Description of the series

  • timestamp: Import timestamp

  • deployment: Deployment

Parameters
  • NAME (string) – Name of the importer

  • DESCRIPTION (string) – Description of the importer

  • error_status (Status) – Error status that describes why the import failed

  • process_status (List(ProcessStatus)) – All process status of the import

  • imported_metadata (List(ProcessStatus)) – All imported metadata

  • imported_series (List(ProcessStatus)) – All imported series

  • evaluation_method (EvaluationMethod) – Evaluation method

  • level (integer) – Level of the data

  • additional_name (string) – Additional name for better identifying

  • SERIES_DESCRIPTION (string) – Description for the series

  • deployment (Deployment) – Deployment in that the series was recorded

  • timestamp (datetime) – Timestamp that stores the import timestamp

create_series_relations(series, rel_series, rel_functions)

Creates a series relation for the given series. Series relations give the user the opportunity to set series in relations and store the functions that were used by the processing.

Parameters
Returns

None

Return type

None

get_data_description(name, description=None)

Returns the data description with the name and the description. If no entry exists that match with the given parameters, an entry will be created and then returned. If the DataDescription could not be created for unknown reason, None will be returned. Furthermore, an process status will be created that explains the error.

Parameters
  • name (string) – Name of the data

  • description (string) – Description of the data

Returns

DataDescription with the given name and the description

Return type

DataDescription or None

get_raw_series(name)

Returns the series with the given param. If the series doesn’t exists, None will be returned.

Parameters

name (string) – Name of the series (without additional)

Returns

Raw series

Return type

DataSeries

get_series(name, level, revision)

Returns the series with the passed parameters. If no series exists that matches the parameters, None will be returned.

Parameters
  • name (string) – Name of the series (without additional)

  • level (integer) – Level of the series

  • revision (integer) – Revision of the series

Returns

Series with the given parameters

Return type

DataSeries

import_points_extensions(points, extensions, data_description)

Imports the given data point extensions. If the extensions could not be imported, a process status will be created that describes the error.

Parameters
  • points (QuerySet(DataPoint)) – Points of the series

  • extensions (pd.DataFrame) – Values of the extensions

  • data_description (DataDescription) – Description of the extensions

import_series(points, validities, series_name, comp_param, level, revision, extensions=None, extension_description=None, time_shift=0)

Imports the series with the given parameters. If errors occur, a process status will be created for each error. For more details, take a look at the methods import_series_metadata, import_series_points, and import_points_extensions. Keep in mind that an error for a series does not affect the import process. If only one series of the housekeeping data is not valid, usually the data processing could be done.

Parameters
  • points (pd.Series) – Data points of the series (for detecting start + end)

  • validities (pd.Series or DataValidity) – Validities of the data points

  • series_name (string) – Name of the series

  • component_parameter (ComponentParameter) – Component parameter of the series

  • level (DataLevel) – Data level of the series

  • revision (integer) – Revision of the series

  • extensions (pd.DataFrame) – Extension for the data points [optional]

  • extension_description (DataDescription) – Description for the extensions [optional]

  • time_shift (integer) – Time shift in seconds [optional]

Returns

Imported series

Return type

DataSeries or None

import_series_metadata(series, series_name, component_parameter, level, revision)

Imports the metadata of the series. If the series is already imported, the series will be returned, and a process status with the status “Series already exists” will be created. If the import could not be done for an unknown reason, a process status will be created to describe the error and None will be returned.

Parameters
  • series (pd.Series) – Data points of the series

  • name (string) – Name of the series

  • component_parameter (ComponentParameter) – Component parameter of the series

  • level (DataLevel) – Data level of the series

  • revision (integer) – Revision of the series

Returns

Imported series, True if series was created, False otherwise

Return type

(DataSeries or None, bool)

import_series_points(series, points, validities)

Imports the data points of the series. If the points could not be imported, a process status will be created that describes the error.

Parameters
  • series (DataSeries) – Series that stores the metadata

  • points (pd.Series) – Points of the series

  • validities (List(DataValidity) or single DataValidity) – Valdities of the points

Returns

Returns the imported points or none

Return type

QuerySet(DataPoint)

abstract rerun(task, reader, parameters, flags, addition)

Reruns the import process.

Important

During the rerun process, a new task with the same properties will be generated. It’s important that the already existing series won’t be deleted since the data wouldn’t be constructable.

Note

To access the imported series, use the attribute task.import_info.imported_series.

Parameters
  • task (Task) – Related task

  • reader (BaseReader) – Reader that already parsed the data to needed structure

  • parameters (models.ManyToManyField) – Parameters that should be imported

  • flags (models.ManyToManyField) – Relations between the flags and the parameters

  • addition (dict) – Additional parameters for the import.

Returns

None

Return type

None

reset_parameters()

Reset the parameters of the importer.

Returns

None

Return type

None

abstract run(reader, parameters, flags, addition)

Runs the import process. The reader provides the data in an interpretable structure. With the parameter parameters, you can define which parameters should be imported. This is necessary because some datasets offer more parameters as needed. If you import series that are already flagged, you can use the parameter flags and define the relationship between the flags and the parameters. With this, the importer can recognize the connection between values and flags and consider it during the import process. Some importer needs additional parameters for the import process. Therefore you can use the parameter addition.

Parameters
  • reader (BaseReader) – Reader that already parsed the data to needed structure

  • parameters (models.ManyToManyField) – Parameters that should be imported

  • flags (models.ManyToManyField) – Relations between the flags and the parameters

  • addition (dict) – Additional parameters for the import.

Returns

None

Return type

None

EvaluationMethod

class IAGOS.apps.workflow.imports.models.EvaluationMethod(*args, **kwargs)

Bases: django.db.models.base.Model

This class implements the table evaluation_method. An instance of the class represents an evaluation method to store which evaluation method was used to process the data. Moreover, the user has the opportunity to select another evaluation method.

See also

Parameters
  • name (string) – Name of the method

  • description (string) – Description of the method

exception DoesNotExist

Bases: django.core.exceptions.ObjectDoesNotExist

exception MultipleObjectsReturned

Bases: django.core.exceptions.MultipleObjectsReturned

FlagRelation

class IAGOS.apps.workflow.imports.models.FlagRelation(*args, **kwargs)

Bases: django.db.models.base.Model

This class implements the database table flag_relation. An instance of the class is a relation between a flag name and a parameter. This relation is needed, if the data was flagged externally and the relation isn’t defined. For example, the Package 1 data from the Portal is already flagged and the flags are own series. To make it possible that this relation can be interpreted by the sytstem, you have to use an instance of this class.

Parameters
  • flag_parameter (Parameter) – Flag parameter

  • parameter (Parameter) – Related parameter

exception DoesNotExist

Bases: django.core.exceptions.ObjectDoesNotExist

exception MultipleObjectsReturned

Bases: django.core.exceptions.MultipleObjectsReturned

ImportInfo

class IAGOS.apps.workflow.imports.models.ImportInfo(*args, **kwargs)

Bases: django.db.models.base.Model

This class implements the table import_info. An instance of the class stores all information about an import process.

See also

Parameters
  • source (string) – Source of the data (e.g. filename, request-cmd)

  • imported_series (models.ManyToManyField) – Imported series

  • imported_metadata (models.ManyToManyField) – Imported metadata

  • evaluation_method (EvaluationMethod) – Evaluation method to process the data

  • process_status (models.ManyToManyField) – Several status of the process (e.g. missing data)

  • status (Status) – Status of the task (e.g. completed)

exception DoesNotExist

Bases: django.core.exceptions.ObjectDoesNotExist

exception MultipleObjectsReturned

Bases: django.core.exceptions.MultipleObjectsReturned

ImportManager

class IAGOS.apps.workflow.imports.models.ImportManager(*args, **kwargs)

Bases: django.db.models.base.Model

An instance of the class represents an import manager. The field transfer defines how the data can be accessed. The reader can parse the data in a standard format that can be interpreted by the importer. With the field parameters, you can define which parameter should be imported. Optionally, you can define relationships between flags and parameters. Therefore, you have to use the model FlagRelation and put the instance to the list.

Parameters
  • source (string) – Source to check if new data exists (e.g. Directory)

  • additional_source (string) – Additional information to parse the data

  • addition (string) – Additional parameters that are needed (JSON format)

  • transfer_handler (TransferHandler) – TransferHandler

  • importer (Importer) – Importer that imports the data

  • reader (BaseReader) – Interpretes and parses the data to a standard format

  • parameters (List(components.Parameter)) – Parameters that should be imported

  • flag_relations (List(FlagRelation)) – Relation between parameters and flags

exception DoesNotExist

Bases: django.core.exceptions.ObjectDoesNotExist

exception MultipleObjectsReturned

Bases: django.core.exceptions.MultipleObjectsReturned

close()

Closes the connection.

Returns

None

Return type

None

connect()

Connects with the host.

Returns

None

Return type

None

get_level()

This method returns the level of the data. Note that the method is used to detect whether the PI should be informed. If the level is smaller than the task manager’s configured level, the PI will not be notified. That allows exporting NRT series without checking the data.

Returns

Returns the level of the data

Return type

integer

get_sources(timestamp)

Returns all sources. For example, if the source of the import manager is an FTP directory, the method will return all files of the directory.

Parameters

timestamp (datetime or None) – Timestamp of the last execution

Returns

Returns all sources

Return type

List(string)

property parameter_names

Returns the parameter names as a string.

Returns

Parameter names

Return type

string

rerun(task)

Reruns the import process.

Important

During the rerun process, a new task with the same properties will be generated. It’s important that the already existing series won’t be deleted since the data wouldn’t be constructable.

Note

To access the imported series, use the attribute task.import_info.imported_series.

Parameters
  • task (Task) – Related task

  • reader (BaseReader) – Reader that already parsed the data to needed structure

  • parameters (models.ManyToManyField) – Parameters that should be imported

  • flags (models.ManyToManyField) – Relations between the flags and the parameters

  • addition (dict) – Additional parameters for the import.

Returns

None

Return type

None

run(task)

Runs the import process. The reader provides the data in an interpretable structure. With the parameter parameters, you can define which parameters should be imported. This is necessary because some datasets offer more parameters as needed. If you import series that are already flagged, you can use the parameter flags and define the relationship between the flags and the parameters. With this, the importer can recognize the connection between values and flags and consider it during the import process. Some importer needs additional parameters for the import process. Therefore you can use the parameter addition.

Parameters

task (Task) – Task

Returns

None

Return type

None

Importer

class IAGOS.apps.workflow.imports.models.Importer(*args, **kwargs)

Bases: django.db.models.base.Model

This class implements the database table importer. An instance of the class imports the data that the reader parsed.

See also

Parameters
  • name (string) – Name of the importer (e.g. PortalImporter)

  • description (string) – Description of the importer (optional)

exception DoesNotExist

Bases: django.core.exceptions.ObjectDoesNotExist

exception MultipleObjectsReturned

Bases: django.core.exceptions.MultipleObjectsReturned

get_error_status()

The method returns the error status of the import. If there is no error, None will be returned. This method will be used by the task manager to inform the user that an error had occurred.

Important

Note that the error status describes that the whole export failed. Don’t use it as a warning.

Returns

Status that describes why the import failed.

Return type

Status

get_evaluation_method()

Returns the evaluation method.

Returns

Evaluation method

Return type

EvaluationMethod

get_imported_metadata()

Returns the imported metadata.

Returns

Imported metadata

Return type

List(ProcessStatus)

get_imported_series()

Returns all imported series.

Returns

Imported series

Return type

List(ProcessStatus)

get_level()

This method returns the level of the data. Note that the method is used to detect whether the PI should be informed. If the level is smaller than the task manager’s configured level, the PI will not be notified. That allows exporting NRT series without checking the data.

Returns

Returns the level of the data

Return type

integer

get_process_status()

Returns all process status of the import.

Returns

All process status of the import

Return type

List(ProcessStatus)

load_instance()

Loads the importer with the name. If the importer is already loaded, the method do not load the instance twice.

Note

All importer are defined in the module importer.all_importer. Make sure that your writer can be loaded by the method get_importer_by_name.

Returns

None

Return type

None

rerun(task, reader, parameters, flags, addition)

Reruns the import process.

Important

During the rerun process, a new task with the same properties will be generated. It’s important that the already existing series won’t be deleted since the data wouldn’t be constructable.

Note

To access the imported series, use the attribute task.import_info.imported_series.

Parameters
  • task (Task) – Related task

  • reader (BaseReader) – Reader that already parsed the data to needed structure

  • parameters (models.ManyToManyField) – Parameters that should be imported

  • flags (models.ManyToManyField) – Relations between the flags and the parameters

  • addition (dict) – Additional parameters for the import.

Returns

None

Return type

None

run(reader, parameters, flags, addition)

Runs the import process. The reader provides the data in an interpretable structure. With the parameter parameters, you can define which parameters should be imported. This is necessary because some datasets offer more parameters as needed. If you import series that are already flagged, you can use the parameter flags and define the relationship between the flags and the parameters. With this, the importer can recognize the connection between values and flags and consider it during the import process. Some importer needs additional parameters for the import process. Therefore you can use the parameter addition.

Parameters
  • reader (Reader) – Reader that already parsed the data to needed structure

  • parameters (models.ManyToManyField) – Parameters that should be imported

  • flags (models.ManyToManyField) – Relations between the flags and the parameters

  • addition (dictionary) – Additional parameters for the import.

Returns

None

Return type

None

Reader

class IAGOS.apps.workflow.imports.models.Reader(*args, **kwargs)

Bases: django.db.models.base.Model

This class implements the database table reader. An instance of the class can read the source’s data and parse it into the needed structure. Every reader has to inherit from the class BaseReader and have to implement the abstract method.

See also

Parameters
  • name (string) – Name of the reader (e.g. WaterVapourNetCDFReader)

  • description (string) – Description of the reader (optional)

exception DoesNotExist

Bases: django.core.exceptions.ObjectDoesNotExist

exception MultipleObjectsReturned

Bases: django.core.exceptions.MultipleObjectsReturned

get_source_name()

Returns a name that describes the source (e.g. flight-ID).

Returns

Description for the source

Return type

string

load_instance()

Loads the reader with the name. If the reader is already loaded, the method doesn’t load the instance again.

Note

All readers are defined in the module readers.all_readers. Make sure that your writer can be loaded by the method get_reader_by_name.

Returns

None

Return type

None

read(source)

Reads the source and parses it to an interpretable format.

Parameters

source (string) – Source of the data

Returns

None

Return type

None

read_list(source, addition_source, addition)

Reads the list and parse it to a format that could be used by the reader.

Parameters
  • source (string) – Source of the data

  • addition_source (string) – Additional source (e.g. other URL)

  • addition (string) – Additional information

Returns

List of source

Return type

object