Synthetic error generation#
- TSCC.preprocessing.synthetic_error_generation.generateSyntheticErrors(s, error_type, error_rate, timedelta=None, random_seed=1, lambda_arr=[4, 5])[source]#
Generate synthetic errors in a time series data set based on specified error types.
- Parameters:
- spandas Series
Series with a timestamp index where synthetic errors are to be generated.
- error_typelist
List of error types to apply. Options include: “noise”, “bias”, “drift”, “constant value”, “outlier”, and “missing”.
- error_ratefloat
Overall error rate as a value between (0, 1], distributed equally across the error types.
- timedeltatimedelta, optional
Time step of the series. If not provided, it is calculated from the index.
- random_seedint, optional
Seed for random number generation to ensure reproducibility, default is 1.
- lambda_arrarray, optional
Array specifying lambdas for certain error types, where: - First value is used for bias errors - Second value is used for constant value errors default is [4, 5].
- Returns:
- spandas Series
The series with synthetic errors injected based on the specified error types.
- s_etypepandas Series
A series of the same length as s, indicating the error type for each observation.
Examples
>>> import numpy as np
>>> num_samples = 10 >>> s = pd.Series(np.random.normal(0, 5, num_samples), name='initial') >>> s_err, s_errtype = TSCC.preprocessing.generateSyntheticErrors(s, error_type = ["noise"], error_rate = 0.5) >>> s_err.name = "with_errors" >>> print(pd.DataFrame([s, s_err]).transpose()) initial with_errors 0 6.249285 -1.482565 1 -9.831159 -5.744147 2 -11.766656 -11.766656 3 -5.868419 -10.010263 4 -1.630816 -1.630816 5 -2.720739 -2.720739 6 3.339932 3.339932 7 -6.249174 -6.249174 8 -4.315455 -4.315455 9 -8.631155 -15.482079