Data quality coefficients#

TSCC.assessment.data_quality_coefficients.calculate_correctness_indicator(n, dataset, error_threshold, threshold_fct=<function threshold_fixed>)[source]#

Calculate the correctness indicator for a sensor network.

Parameters:

nint: Number of nodes in the sensor network.
datasetlist: A list of lists representing dataset for each node. Every element of dataset consists of two tupel. First tupel consists of the real values and the second tupel consists of the observed values. e.g. [(25, 26, 24, 25, 26), (26, 25, 24, 25, 26)] # [(‘real values’),(‘observed values’)]
error_thresholdint or float: Difference between real and observed values.

Returns:

qtfloat: Correctness Indicator.

Examples

>>> n_nodes = 5
>>> error_threshold = 1.0  # Error threshold for correctness
>>> # Example data for each node (val_real, val_observed)
>>> node_data_sequence = [
>>>    [(25, 26, 24, 25, 26), (26, 25, 24, 25, 26)],  # Example data for node 1
>>>    [(20, 21, 22, 20, 21), (21, 20, 22, 20, 21)],  # Example data for node 2
>>>    [(30, 32, 31, 30, 32), (32, 30, 31, 30, 32)],  # Example data for node 3
>>>    [(18, 19, 18, 19, 20), (19, 18, 18, 19, 20)],  # Example data for node 4
>>>    [(28, 29, 30, 28, 29), (29, 28, 30, 28, 29)],  # Example data for node 5
>>>    ]
>>> qa_result = calculate_correctness_indicator(n_nodes, node_data_sequence, error_threshold)
>>> qa_result
0.91999999999

TSCC.assessment.data_quality_coefficients.calculate_quality_coefficients(df_list, type='all_at_once', error_threshold_corr=0.0, threshold_fct=<function threshold_fixed>)[source]#

Calculate the quality criteria for a sensor network.

Parameters:

df_listlist of pandas data frames: Index for each data frame is time stamp having datetime format, first column is raw value, second column is ground truth value
typestr: Decide from [“all_at_once”, “each_by_itself”].

Returns:

q_coefflist: Quality indicators <volume>, <correctness> and meta data dictionary.

Examples

>>> df_list = []
>>> for cur_ID in df[df_id_col].unique():
>>>     df_list.append(df[df[df_id_col] == cur_ID].set_index("timestamp")[["value_raw", "value_plaus"]])
>>>
>>> qv_result = calculate_quality_coefficients(df_list)
>>> qv_result[0]
0.989752
>>> qv_result[1]
0.99
>>> qv_result[2]
{0: {'monitoring_duration': Timedelta('329 days 05:05:00'),
'monitoring_duration_raw': Timedelta('329 days 05:05:00'),
'timestep': Timedelta('0 days 00:05:00'),
'timestep_raw': Timedelta('0 days 00:05:00')},
1: {'monitoring_duration': Timedelta('332 days 08:15:00'),
'monitoring_duration_raw': Timedelta('332 days 08:15:00'),
'timestep': Timedelta('0 days 00:05:00'),
'timestep_raw': Timedelta('0 days 00:05:00')}}

TSCC.assessment.data_quality_coefficients.calculate_volume_indicator(n, T, delta_t, sampling_data)[source]#

Calculate the quality criterium data volume for a sensor network.

Parameters:

nint: Number of sensors in the network.
Tint or float: Monitoring time duration overall.
delta_tint or float: Requested/ usual time interval of subsequent observations for each sensor.
sampling_datanested list: A list of lists representing sampling data for each node.

Returns:

qvfloat: Data volume quality indicator.

Examples

>>> n_nodes = 5
>>> monitoring_duration = 5  # in arbitrary units
>>> time_interval = 1  # in arbitrary units
>>> node_sampling_data = [
>>>    [1, 2, 3, None, 5],  # Example data for node 1
>>>    [1, 2, 3, 4, 5],     # Example data for node 2
>>>    [1, None, 3, np.NaN, 5],  # Example data for node 3
>>>    [1, 2, 3, None, 5],     # Example data for node 4
>>>    [1, 2, "abc", 4],     # Example data for node 5
>>>    ]
>>> qv_result = calculate_data_volume_indicator(n_nodes, monitoring_duration, time_interval, node_sampling_data)
>>> qv_result
0.8