Data quality coefficients#
- TSCC.assessment.data_quality_coefficients.calculate_correctness_indicator(n, dataset, error_threshold, threshold_fct=<function threshold_fixed>)[source]#
Calculate the correctness indicator for a sensor network.
- Parameters:
- nint
Number of nodes in the sensor network.
- datasetlist
A list of lists representing dataset for each node. Every element of dataset consists of two tupel. First tupel consists of the real values and the second tupel consists of the observed values. e.g. [(25, 26, 24, 25, 26), (26, 25, 24, 25, 26)] # [(‘real values’),(‘observed values’)]
- error_thresholdint or float
Difference between real and observed values.
- Returns:
- qtfloat
Correctness Indicator.
Examples
>>> n_nodes = 5 >>> error_threshold = 1.0 # Error threshold for correctness >>> # Example data for each node (val_real, val_observed) >>> node_data_sequence = [ >>> [(25, 26, 24, 25, 26), (26, 25, 24, 25, 26)], # Example data for node 1 >>> [(20, 21, 22, 20, 21), (21, 20, 22, 20, 21)], # Example data for node 2 >>> [(30, 32, 31, 30, 32), (32, 30, 31, 30, 32)], # Example data for node 3 >>> [(18, 19, 18, 19, 20), (19, 18, 18, 19, 20)], # Example data for node 4 >>> [(28, 29, 30, 28, 29), (29, 28, 30, 28, 29)], # Example data for node 5 >>> ] >>> qa_result = calculate_correctness_indicator(n_nodes, node_data_sequence, error_threshold) >>> qa_result 0.91999999999
- TSCC.assessment.data_quality_coefficients.calculate_quality_coefficients(df_list, type='all_at_once', error_threshold_corr=0.0, threshold_fct=<function threshold_fixed>)[source]#
Calculate the quality criteria for a sensor network.
- Parameters:
- df_listlist of pandas data frames
Index for each data frame is time stamp having datetime format, first column is raw value, second column is ground truth value
- typestr
Decide from [“all_at_once”, “each_by_itself”].
- Returns:
- q_coefflist
Quality indicators <volume>, <correctness> and meta data dictionary.
Examples
>>> df_list = [] >>> for cur_ID in df[df_id_col].unique(): >>> df_list.append(df[df[df_id_col] == cur_ID].set_index("timestamp")[["value_raw", "value_plaus"]]) >>> >>> qv_result = calculate_quality_coefficients(df_list) >>> qv_result[0] 0.989752 >>> qv_result[1] 0.99 >>> qv_result[2] {0: {'monitoring_duration': Timedelta('329 days 05:05:00'), 'monitoring_duration_raw': Timedelta('329 days 05:05:00'), 'timestep': Timedelta('0 days 00:05:00'), 'timestep_raw': Timedelta('0 days 00:05:00')}, 1: {'monitoring_duration': Timedelta('332 days 08:15:00'), 'monitoring_duration_raw': Timedelta('332 days 08:15:00'), 'timestep': Timedelta('0 days 00:05:00'), 'timestep_raw': Timedelta('0 days 00:05:00')}}
- TSCC.assessment.data_quality_coefficients.calculate_volume_indicator(n, T, delta_t, sampling_data)[source]#
Calculate the quality criterium data volume for a sensor network.
- Parameters:
- nint
Number of sensors in the network.
- Tint or float
Monitoring time duration overall.
- delta_tint or float
Requested/ usual time interval of subsequent observations for each sensor.
- sampling_datanested list
A list of lists representing sampling data for each node.
- Returns:
- qvfloat
Data volume quality indicator.
Examples
>>> n_nodes = 5 >>> monitoring_duration = 5 # in arbitrary units >>> time_interval = 1 # in arbitrary units >>> node_sampling_data = [ >>> [1, 2, 3, None, 5], # Example data for node 1 >>> [1, 2, 3, 4, 5], # Example data for node 2 >>> [1, None, 3, np.NaN, 5], # Example data for node 3 >>> [1, 2, 3, None, 5], # Example data for node 4 >>> [1, 2, "abc", 4], # Example data for node 5 >>> ] >>> qv_result = calculate_data_volume_indicator(n_nodes, monitoring_duration, time_interval, node_sampling_data) >>> qv_result 0.8