IAGOS.apps.workflow.imports
Overview
The ImportManager combines the components Reader and Importer. The Reader reads the data and parses it to a standardized format that the Importer can interpret. The Importer imports and processes the parsed data. Since the importer expects the standardized format, it doesn’t have to know the format. This allows you to implement different Readers for different formats and to use the same importer. If you import processed data or final data, you can ignore the data processing part. Within the IAGOS project, the raw data will be read and then processed. The data processing is a part of the Importer because in some use cases the data needs to be synchronized. It’s recommended to synchronize the data first and then to import only the synchronized data.
Interfaces
Interface - Reader
Every reader has to inherit from the abstract base class BaseReader. This approach ensures that new readers can be integrated into the workflow without modifying the existing code. Each reader has to provide its metadata (NAME, DESCRIPTION) and has to implement the abstract methods read and read_list. These methods read and parse the data to a standardized format which can be interpreted by the importer.
See also
- class IAGOS.apps.workflow.imports.readers.base.BaseReader
Every reader has to inherit from this abstract base class. This approach ensures that new readers can be integrated into the workflow without modifying the existing code. Each reader has to provide its metadata (NAME, DESCRIPTION) and has to implement the abstract methods read and read_list. These methods read and parse the data to a standardized format which can be interpreted by the importer. Moreover, each reader has to provide an additional name (e.g., filename, flight-id, etc).
Warning
If you add additional attributes in the inherited classes, make sure that you reset them in the method read. Moreover, ensure that call the super-constructor.
- Parameters
NAME (string) – Name of the reader
DESCRIPTION (string) – Description of the reader
- check_description()
Checks if the DESCRIPTION is set and contains at least 1 character. If the DESCRIPTION is not valid, the method will raise a ValueError or a TypeError.
- Raises
ValueError – If DESCRIPTION is not set or empty
TypeError – If DESCRIPTION is not a string
- Returns
None
- Return type
None
- check_metadata()
Checks if the metadata is correct.
- Raises
ValueError – If NAME is not set or empty
TypeError – If NAME is not a string
ValueError – If DESCRIPTION is not set or empty
TypeError – If DESCRIPTION is not a string
- Returns
None
- Return type
None
- check_name()
Checks if the NAME is set and contains at least 1 character. If the NAME is not valid, the method will raise a ValueError or a TypeError.
- Raises
ValueError – If NAME is not set or empty
TypeError – If NAME is not a string
- Returns
None
- Return type
None
- get_source()
Returns the source of an instance.
- Returns
Source of the data (e.g., filename)
- Return type
string
- abstract get_source_name()
Returns an additional name (e.g., filename of the dataset) which describes the source. This method is used by the SeriesImporter to add an addition to the series name to avoid that the every series of the same parameter has the same name.
- Returns
Name
- Return type
string
- abstract read(source)
Reads the data and parse it to a standarized structure.
- Parameters
source (string) – Source of the data
- Returns
None
- Return type
None
- abstract static read_list(source, additional_source, additional_info)
Parses the given source and prepares it for the TransferHandler. The method is needed since the TransferHandler can’t interpret the responses from external servers. For example, if the TransferHandler checks for new flights (e.g., http://example.com/all-flights/?timestamp=2020-01-01), it will receive a list with the flights that were performed after the given timestamp. These flight IDs will be used to create the requests for the server (e.g., http://example.com/flight/data/?param={id}). In that case, the method will return a list with all created requests for each flight.
Important
Some TransferTypes don’t need this functionality (e.g., DirectoryTransfer). In that case, the method returns just the source.
- Parameters
source (string) – Source of the data
additional_source (string) – Additional source to create requests
additional_info (string) – Additional information for parsing
- Returns
Sources
- Return type
List(string)
Interface - Importer
In this module, the base classes for all importers are implemented. Every importer has to be inherited from one of these classes. With this approach, you can implement new importer without changing the existing code. Each importer has to provide its metadata. Furthermore, every importer has to implement the abstract methods run and rerun. These methods are used to run/rerun the import process.
- class IAGOS.apps.workflow.imports.importer.base.BaseImporter
Every importer has to inherit from this abstract base class. This approach ensures that new importers can be integrated into the workflow without modifying the existing code. Each importer has to provide its metadata (NAME, DESCRIPTION) and has to implement the abstract methods run and rerun.
Warning
If you add additional attributes in the inherited classes, make sure that you reset them in the method run. Moreover, ensure that call the super-constructor.
- Parameters
NAME (string) – Name of the importer
DESCRIPTION (string) – Description of the importer
error_status (Status) – Error status that describes why the import failed
process_status (List(ProcessStatus)) – All process status of the import
imported_metadata (List(Metadata)) – All imported metadata
imported_series (List(DataSeries)) – All imported series
evaluation_method (EvaluationMethod) – Evaluation method
level (integer) – Level of the data
- check_description()
Checks if the DESCRIPTION is set and contains at least 1 character. If the DESCRIPTION is not valid, the method will raise a ValueError or a TypeError.
- Raises
ValueError – If DESCRIPTION is not set or empty
TypeError – If DESCRIPTION is not a string
- Returns
None
- Return type
None
- check_metadata()
Checks if the metadata is correct.
- Raises
ValueError – If NAME is not set or empty
TypeError – If NAME is not a string
ValueError – If DESCRIPTION is not set or empty
TypeError – If DESCRIPTION is not a string
- Returns
None
- Return type
None
- check_name()
Checks if the NAME is set and contains at least 1 character. If the NAME is not valid, the method will raise a ValueError or a TypeError.
- Raises
ValueError – If NAME is not set or empty
TypeError – If NAME is not a string
- Returns
None
- Return type
None
- get_error_status()
The method returns the error status of the import. If there is no error, None will be returned. This method will be used by the task manager to inform the user that an error had occurred.
Important
Note that the error status describes that the whole export failed. Don’t use it as a warning.
- Returns
Status that describes why the import failed.
- Return type
- get_evaluation_method()
Returns the evaluation method.
- Returns
Evaluation method
- Return type
- get_imported_metadata()
Returns the imported metadata.
- Returns
Imported metadata
- Return type
List(ProcessStatus)
- get_imported_series()
Returns all imported series.
- Returns
Imported series
- Return type
List(ProcessStatus)
- get_level()
This method returns the level of the data. Note that the method is used to detect whether the PI should be informed. If the level is smaller than the task manager’s configured level, the PI will not be notified. That allows exporting NRT series without checking the data.
- Returns
Returns the level of the data
- Return type
integer
- get_process_status()
Returns all process status of the import.
- Returns
All process status of the import
- Return type
List(ProcessStatus)
- abstract rerun(task, reader, parameters, flags, addition)
Reruns the import process.
Important
If your Importer process data, make sure that a new task will be generated during the rerun process. It’s important that the existing series won’t be deleted since the data wouldn’t be constructable anymore.
- Parameters
task (Task) – Related task
reader (BaseReader) – Reader that already parsed the data to needed structure
parameters (models.ManyToManyField) – Parameters that should be imported
flags (models.ManyToManyField) – Relations between the flags and the parameters
addition (dict) – Additional parameters for the import.
- Returns
None
- Return type
None
- reset_parameters()
Reset the parameters of the importer.
- Returns
None
- Return type
None
- abstract run(reader, parameters, flags, addition)
Runs the import process. The reader provides the data in an interpretable structure. With the parameter parameters, you can define which parameters should be imported. This is necessary because some datasets offer more parameters as needed. If you import series that are already flagged, you can use the parameter flags to define the relationship between the flags and the parameters. Some importer needs additional parameters for the import process. Therefore you can use the parameter addition.
- Parameters
reader (BaseReader) – Reader that already parsed the data to needed structure
parameters (models.ManyToManyField) – Parameters that should be imported
flags (models.ManyToManyField) – Relations between the flags and the parameters
addition (dict) – Additional parameters for the import.
- Returns
None
- Return type
None
- class IAGOS.apps.workflow.imports.importer.base.SeriesImporter
This class implements an abstract base class for a series importer. An instance of the class has the same properties like an instance of the class BaseImporter. In addition, the class provides methods to import time series.
Warning
Make sure that the following attributes are set:
additional_name: Additional name (default: empty)
SERIES_DESCRIPTION: Description of the series
timestamp: Import timestamp
deployment: Deployment
- Parameters
NAME (string) – Name of the importer
DESCRIPTION (string) – Description of the importer
error_status (Status) – Error status that describes why the import failed
process_status (List(ProcessStatus)) – All process status of the import
imported_metadata (List(ProcessStatus)) – All imported metadata
imported_series (List(ProcessStatus)) – All imported series
evaluation_method (EvaluationMethod) – Evaluation method
level (integer) – Level of the data
additional_name (string) – Additional name for better identifying
SERIES_DESCRIPTION (string) – Description for the series
deployment (Deployment) – Deployment in that the series was recorded
timestamp (datetime) – Timestamp that stores the import timestamp
- create_series_relations(series, rel_series, rel_functions)
Creates a series relation for the given series. Series relations give the user the opportunity to set series in relations and store the functions that were used by the processing.
- Parameters
series (DataSeries) – Series
rel_series (List(DataSeries)) – Related series
- Returns
None
- Return type
None
- get_data_description(name, description=None)
Returns the data description with the name and the description. If no entry exists that match with the given parameters, an entry will be created and then returned. If the DataDescription could not be created for unknown reason, None will be returned. Furthermore, an process status will be created that explains the error.
- Parameters
name (string) – Name of the data
description (string) – Description of the data
- Returns
DataDescription with the given name and the description
- Return type
DataDescription or None
- get_raw_series(name)
Returns the series with the given param. If the series doesn’t exists, None will be returned.
- Parameters
name (string) – Name of the series (without additional)
- Returns
Raw series
- Return type
- get_series(name, level, revision)
Returns the series with the passed parameters. If no series exists that matches the parameters, None will be returned.
- Parameters
name (string) – Name of the series (without additional)
level (integer) – Level of the series
revision (integer) – Revision of the series
- Returns
Series with the given parameters
- Return type
- import_points_extensions(points, extensions, data_description)
Imports the given data point extensions. If the extensions could not be imported, a process status will be created that describes the error.
- Parameters
points (QuerySet(DataPoint)) – Points of the series
extensions (pd.DataFrame) – Values of the extensions
data_description (DataDescription) – Description of the extensions
- import_series(points, validities, series_name, comp_param, level, revision, extensions=None, extension_description=None, time_shift=0)
Imports the series with the given parameters. If errors occur, a process status will be created for each error. For more details, take a look at the methods import_series_metadata, import_series_points, and import_points_extensions. Keep in mind that an error for a series does not affect the import process. If only one series of the housekeeping data is not valid, usually the data processing could be done.
- Parameters
points (pd.Series) – Data points of the series (for detecting start + end)
validities (pd.Series or DataValidity) – Validities of the data points
series_name (string) – Name of the series
component_parameter (ComponentParameter) – Component parameter of the series
level (DataLevel) – Data level of the series
revision (integer) – Revision of the series
extensions (pd.DataFrame) – Extension for the data points [optional]
extension_description (DataDescription) – Description for the extensions [optional]
time_shift (integer) – Time shift in seconds [optional]
- Returns
Imported series
- Return type
DataSeries or None
- import_series_metadata(series, series_name, component_parameter, level, revision)
Imports the metadata of the series. If the series is already imported, the series will be returned, and a process status with the status “Series already exists” will be created. If the import could not be done for an unknown reason, a process status will be created to describe the error and None will be returned.
- Parameters
series (pd.Series) – Data points of the series
name (string) – Name of the series
component_parameter (ComponentParameter) – Component parameter of the series
level (DataLevel) – Data level of the series
revision (integer) – Revision of the series
- Returns
Imported series, True if series was created, False otherwise
- Return type
(DataSeries or None, bool)
- import_series_points(series, points, validities)
Imports the data points of the series. If the points could not be imported, a process status will be created that describes the error.
- Parameters
series (DataSeries) – Series that stores the metadata
points (pd.Series) – Points of the series
validities (List(DataValidity) or single DataValidity) – Valdities of the points
- Returns
Returns the imported points or none
- Return type
QuerySet(DataPoint)
- abstract rerun(task, reader, parameters, flags, addition)
Reruns the import process.
Important
During the rerun process, a new task with the same properties will be generated. It’s important that the already existing series won’t be deleted since the data wouldn’t be constructable.
Note
To access the imported series, use the attribute task.import_info.imported_series.
- Parameters
task (Task) – Related task
reader (BaseReader) – Reader that already parsed the data to needed structure
parameters (models.ManyToManyField) – Parameters that should be imported
flags (models.ManyToManyField) – Relations between the flags and the parameters
addition (dict) – Additional parameters for the import.
- Returns
None
- Return type
None
- reset_parameters()
Reset the parameters of the importer.
- Returns
None
- Return type
None
- abstract run(reader, parameters, flags, addition)
Runs the import process. The reader provides the data in an interpretable structure. With the parameter parameters, you can define which parameters should be imported. This is necessary because some datasets offer more parameters as needed. If you import series that are already flagged, you can use the parameter flags and define the relationship between the flags and the parameters. With this, the importer can recognize the connection between values and flags and consider it during the import process. Some importer needs additional parameters for the import process. Therefore you can use the parameter addition.
- Parameters
reader (BaseReader) – Reader that already parsed the data to needed structure
parameters (models.ManyToManyField) – Parameters that should be imported
flags (models.ManyToManyField) – Relations between the flags and the parameters
addition (dict) – Additional parameters for the import.
- Returns
None
- Return type
None
EvaluationMethod
- class IAGOS.apps.workflow.imports.models.EvaluationMethod(*args, **kwargs)
Bases:
django.db.models.base.Model
This class implements the table evaluation_method. An instance of the class represents an evaluation method to store which evaluation method was used to process the data. Moreover, the user has the opportunity to select another evaluation method.
See also
IAGOS.apps.workflow.imports.models.ImporterManager
- Parameters
name (string) – Name of the method
description (string) – Description of the method
- exception DoesNotExist
Bases:
django.core.exceptions.ObjectDoesNotExist
- exception MultipleObjectsReturned
Bases:
django.core.exceptions.MultipleObjectsReturned
FlagRelation
- class IAGOS.apps.workflow.imports.models.FlagRelation(*args, **kwargs)
Bases:
django.db.models.base.Model
This class implements the database table flag_relation. An instance of the class is a relation between a flag name and a parameter. This relation is needed, if the data was flagged externally and the relation isn’t defined. For example, the Package 1 data from the Portal is already flagged and the flags are own series. To make it possible that this relation can be interpreted by the sytstem, you have to use an instance of this class.
- exception DoesNotExist
Bases:
django.core.exceptions.ObjectDoesNotExist
- exception MultipleObjectsReturned
Bases:
django.core.exceptions.MultipleObjectsReturned
ImportInfo
- class IAGOS.apps.workflow.imports.models.ImportInfo(*args, **kwargs)
Bases:
django.db.models.base.Model
This class implements the table import_info. An instance of the class stores all information about an import process.
See also
IAGOS.apps.workflow.imports.models.ImporterManager
- Parameters
source (string) – Source of the data (e.g. filename, request-cmd)
imported_series (models.ManyToManyField) – Imported series
imported_metadata (models.ManyToManyField) – Imported metadata
evaluation_method (EvaluationMethod) – Evaluation method to process the data
process_status (models.ManyToManyField) – Several status of the process (e.g. missing data)
status (Status) – Status of the task (e.g. completed)
- exception DoesNotExist
Bases:
django.core.exceptions.ObjectDoesNotExist
- exception MultipleObjectsReturned
Bases:
django.core.exceptions.MultipleObjectsReturned
ImportManager
- class IAGOS.apps.workflow.imports.models.ImportManager(*args, **kwargs)
Bases:
django.db.models.base.Model
An instance of the class represents an import manager. The field transfer defines how the data can be accessed. The reader can parse the data in a standard format that can be interpreted by the importer. With the field parameters, you can define which parameter should be imported. Optionally, you can define relationships between flags and parameters. Therefore, you have to use the model FlagRelation and put the instance to the list.
- Parameters
source (string) – Source to check if new data exists (e.g. Directory)
additional_source (string) – Additional information to parse the data
addition (string) – Additional parameters that are needed (JSON format)
transfer_handler (TransferHandler) – TransferHandler
importer (Importer) – Importer that imports the data
reader (BaseReader) – Interpretes and parses the data to a standard format
parameters (List(components.Parameter)) – Parameters that should be imported
flag_relations (List(FlagRelation)) – Relation between parameters and flags
- exception DoesNotExist
Bases:
django.core.exceptions.ObjectDoesNotExist
- exception MultipleObjectsReturned
Bases:
django.core.exceptions.MultipleObjectsReturned
- close()
Closes the connection.
- Returns
None
- Return type
None
- connect()
Connects with the host.
- Returns
None
- Return type
None
- get_level()
This method returns the level of the data. Note that the method is used to detect whether the PI should be informed. If the level is smaller than the task manager’s configured level, the PI will not be notified. That allows exporting NRT series without checking the data.
- Returns
Returns the level of the data
- Return type
integer
- get_sources(timestamp)
Returns all sources. For example, if the source of the import manager is an FTP directory, the method will return all files of the directory.
- Parameters
timestamp (datetime or None) – Timestamp of the last execution
- Returns
Returns all sources
- Return type
List(string)
- property parameter_names
Returns the parameter names as a string.
- Returns
Parameter names
- Return type
string
- rerun(task)
Reruns the import process.
Important
During the rerun process, a new task with the same properties will be generated. It’s important that the already existing series won’t be deleted since the data wouldn’t be constructable.
Note
To access the imported series, use the attribute task.import_info.imported_series.
- Parameters
task (Task) – Related task
reader (BaseReader) – Reader that already parsed the data to needed structure
parameters (models.ManyToManyField) – Parameters that should be imported
flags (models.ManyToManyField) – Relations between the flags and the parameters
addition (dict) – Additional parameters for the import.
- Returns
None
- Return type
None
- run(task)
Runs the import process. The reader provides the data in an interpretable structure. With the parameter parameters, you can define which parameters should be imported. This is necessary because some datasets offer more parameters as needed. If you import series that are already flagged, you can use the parameter flags and define the relationship between the flags and the parameters. With this, the importer can recognize the connection between values and flags and consider it during the import process. Some importer needs additional parameters for the import process. Therefore you can use the parameter addition.
- Parameters
task (Task) – Task
- Returns
None
- Return type
None
Importer
- class IAGOS.apps.workflow.imports.models.Importer(*args, **kwargs)
Bases:
django.db.models.base.Model
This class implements the database table importer. An instance of the class imports the data that the reader parsed.
See also
IAGOS.apps.workflow.imports.importer.BaseImporter
IAGOS.apps.workflow.imports.readers
IAGOS.apps.workflow.imports.models.ImporterManager
- Parameters
name (string) – Name of the importer (e.g. PortalImporter)
description (string) – Description of the importer (optional)
- exception DoesNotExist
Bases:
django.core.exceptions.ObjectDoesNotExist
- exception MultipleObjectsReturned
Bases:
django.core.exceptions.MultipleObjectsReturned
- get_error_status()
The method returns the error status of the import. If there is no error, None will be returned. This method will be used by the task manager to inform the user that an error had occurred.
Important
Note that the error status describes that the whole export failed. Don’t use it as a warning.
- Returns
Status that describes why the import failed.
- Return type
- get_evaluation_method()
Returns the evaluation method.
- Returns
Evaluation method
- Return type
- get_imported_metadata()
Returns the imported metadata.
- Returns
Imported metadata
- Return type
List(ProcessStatus)
- get_imported_series()
Returns all imported series.
- Returns
Imported series
- Return type
List(ProcessStatus)
- get_level()
This method returns the level of the data. Note that the method is used to detect whether the PI should be informed. If the level is smaller than the task manager’s configured level, the PI will not be notified. That allows exporting NRT series without checking the data.
- Returns
Returns the level of the data
- Return type
integer
- get_process_status()
Returns all process status of the import.
- Returns
All process status of the import
- Return type
List(ProcessStatus)
- load_instance()
Loads the importer with the name. If the importer is already loaded, the method do not load the instance twice.
Note
All importer are defined in the module importer.all_importer. Make sure that your writer can be loaded by the method get_importer_by_name.
- Returns
None
- Return type
None
- rerun(task, reader, parameters, flags, addition)
Reruns the import process.
Important
During the rerun process, a new task with the same properties will be generated. It’s important that the already existing series won’t be deleted since the data wouldn’t be constructable.
Note
To access the imported series, use the attribute task.import_info.imported_series.
- Parameters
task (Task) – Related task
reader (BaseReader) – Reader that already parsed the data to needed structure
parameters (models.ManyToManyField) – Parameters that should be imported
flags (models.ManyToManyField) – Relations between the flags and the parameters
addition (dict) – Additional parameters for the import.
- Returns
None
- Return type
None
- run(reader, parameters, flags, addition)
Runs the import process. The reader provides the data in an interpretable structure. With the parameter parameters, you can define which parameters should be imported. This is necessary because some datasets offer more parameters as needed. If you import series that are already flagged, you can use the parameter flags and define the relationship between the flags and the parameters. With this, the importer can recognize the connection between values and flags and consider it during the import process. Some importer needs additional parameters for the import process. Therefore you can use the parameter addition.
- Parameters
reader (Reader) – Reader that already parsed the data to needed structure
parameters (models.ManyToManyField) – Parameters that should be imported
flags (models.ManyToManyField) – Relations between the flags and the parameters
addition (dictionary) – Additional parameters for the import.
- Returns
None
- Return type
None
Reader
- class IAGOS.apps.workflow.imports.models.Reader(*args, **kwargs)
Bases:
django.db.models.base.Model
This class implements the database table reader. An instance of the class can read the source’s data and parse it into the needed structure. Every reader has to inherit from the class BaseReader and have to implement the abstract method.
See also
IAGOS.apps.workflow.imports.models.ImporterManager
- Parameters
name (string) – Name of the reader (e.g. WaterVapourNetCDFReader)
description (string) – Description of the reader (optional)
- exception DoesNotExist
Bases:
django.core.exceptions.ObjectDoesNotExist
- exception MultipleObjectsReturned
Bases:
django.core.exceptions.MultipleObjectsReturned
- get_source_name()
Returns a name that describes the source (e.g. flight-ID).
- Returns
Description for the source
- Return type
string
- load_instance()
Loads the reader with the name. If the reader is already loaded, the method doesn’t load the instance again.
Note
All readers are defined in the module readers.all_readers. Make sure that your writer can be loaded by the method get_reader_by_name.
- Returns
None
- Return type
None
- read(source)
Reads the source and parses it to an interpretable format.
- Parameters
source (string) – Source of the data
- Returns
None
- Return type
None
- read_list(source, addition_source, addition)
Reads the list and parse it to a format that could be used by the reader.
- Parameters
source (string) – Source of the data
addition_source (string) – Additional source (e.g. other URL)
addition (string) – Additional information
- Returns
List of source
- Return type
object