Synthetic error generation#

TSCC.preprocessing.synthetic_error_generation.generateSyntheticErrors(s, error_type, error_rate, timedelta=None, random_seed=1, lambda_arr=[4, 5])[source]#

Generate synthetic errors in a time series data set based on specified error types.

Parameters:
spandas Series

Series with a timestamp index where synthetic errors are to be generated.

error_typelist

List of error types to apply. Options include: “noise”, “bias”, “drift”, “constant value”, “outlier”, and “missing”.

error_ratefloat

Overall error rate as a value between (0, 1], distributed equally across the error types.

timedeltatimedelta, optional

Time step of the series. If not provided, it is calculated from the index.

random_seedint, optional

Seed for random number generation to ensure reproducibility, default is 1.

lambda_arrarray, optional

Array specifying lambdas for certain error types, where: - First value is used for bias errors - Second value is used for constant value errors default is [4, 5].

Returns:
spandas Series

The series with synthetic errors injected based on the specified error types.

s_etypepandas Series

A series of the same length as s, indicating the error type for each observation.

Examples

>>> import numpy as np
>>> num_samples = 10
>>> s = pd.Series(np.random.normal(0, 5, num_samples), name='initial')
>>> s_err, s_errtype = TSCC.preprocessing.generateSyntheticErrors(s, error_type = ["noise"], error_rate = 0.5)
>>> s_err.name = "with_errors"
>>> print(pd.DataFrame([s, s_err]).transpose())
     initial  with_errors
0   6.249285    -1.482565
1  -9.831159    -5.744147
2 -11.766656   -11.766656
3  -5.868419   -10.010263
4  -1.630816    -1.630816
5  -2.720739    -2.720739
6   3.339932     3.339932
7  -6.249174    -6.249174
8  -4.315455    -4.315455
9  -8.631155   -15.482079