Data quality coefficients#

TSCC.assessment.data_quality_coefficients.calculate_correctness_indicator(n, dataset, error_threshold, threshold_fct=<function threshold_fixed>)[source]#

Calculate the correctness indicator for a sensor network.

Parameters:
nint

Number of nodes in the sensor network.

datasetlist

A list of lists representing dataset for each node. Every element of dataset consists of two tupel. First tupel consists of the real values and the second tupel consists of the observed values. e.g. [(25, 26, 24, 25, 26), (26, 25, 24, 25, 26)] # [(‘real values’),(‘observed values’)]

error_thresholdint or float

Difference between real and observed values.

Returns:
qtfloat

Correctness Indicator.

Examples

>>> n_nodes = 5
>>> error_threshold = 1.0  # Error threshold for correctness
>>> # Example data for each node (val_real, val_observed)
>>> node_data_sequence = [
>>>    [(25, 26, 24, 25, 26), (26, 25, 24, 25, 26)],  # Example data for node 1
>>>    [(20, 21, 22, 20, 21), (21, 20, 22, 20, 21)],  # Example data for node 2
>>>    [(30, 32, 31, 30, 32), (32, 30, 31, 30, 32)],  # Example data for node 3
>>>    [(18, 19, 18, 19, 20), (19, 18, 18, 19, 20)],  # Example data for node 4
>>>    [(28, 29, 30, 28, 29), (29, 28, 30, 28, 29)],  # Example data for node 5
>>>    ]
>>> qa_result = calculate_correctness_indicator(n_nodes, node_data_sequence, error_threshold)
>>> qa_result
0.91999999999
TSCC.assessment.data_quality_coefficients.calculate_quality_coefficients(df_list, type='all_at_once', error_threshold_corr=0.0, threshold_fct=<function threshold_fixed>)[source]#

Calculate the quality criteria for a sensor network.

Parameters:
df_listlist of pandas data frames

Index for each data frame is time stamp having datetime format, first column is raw value, second column is ground truth value

typestr

Decide from [“all_at_once”, “each_by_itself”].

Returns:
q_coefflist

Quality indicators <volume>, <correctness> and meta data dictionary.

Examples

>>> df_list = []
>>> for cur_ID in df[df_id_col].unique():
>>>     df_list.append(df[df[df_id_col] == cur_ID].set_index("timestamp")[["value_raw", "value_plaus"]])
>>>
>>> qv_result = calculate_quality_coefficients(df_list)
>>> qv_result[0]
0.989752
>>> qv_result[1]
0.99
>>> qv_result[2]
{0: {'monitoring_duration': Timedelta('329 days 05:05:00'),
'monitoring_duration_raw': Timedelta('329 days 05:05:00'),
'timestep': Timedelta('0 days 00:05:00'),
'timestep_raw': Timedelta('0 days 00:05:00')},
1: {'monitoring_duration': Timedelta('332 days 08:15:00'),
'monitoring_duration_raw': Timedelta('332 days 08:15:00'),
'timestep': Timedelta('0 days 00:05:00'),
'timestep_raw': Timedelta('0 days 00:05:00')}}
TSCC.assessment.data_quality_coefficients.calculate_volume_indicator(n, T, delta_t, sampling_data)[source]#

Calculate the quality criterium data volume for a sensor network.

Parameters:
nint

Number of sensors in the network.

Tint or float

Monitoring time duration overall.

delta_tint or float

Requested/ usual time interval of subsequent observations for each sensor.

sampling_datanested list

A list of lists representing sampling data for each node.

Returns:
qvfloat

Data volume quality indicator.

Examples

>>> n_nodes = 5
>>> monitoring_duration = 5  # in arbitrary units
>>> time_interval = 1  # in arbitrary units
>>> node_sampling_data = [
>>>    [1, 2, 3, None, 5],  # Example data for node 1
>>>    [1, 2, 3, 4, 5],     # Example data for node 2
>>>    [1, None, 3, np.NaN, 5],  # Example data for node 3
>>>    [1, 2, 3, None, 5],     # Example data for node 4
>>>    [1, 2, "abc", 4],     # Example data for node 5
>>>    ]
>>> qv_result = calculate_data_volume_indicator(n_nodes, monitoring_duration, time_interval, node_sampling_data)
>>> qv_result
0.8