Baseline error correction#

Function overview

NA by mode#

With this function you can replace the missing values in your dataframe with the most common number (mode) in it. Here’s an example of the ‘BASE_NA_byMode’ function:

Replacing with mode#
>>> d = {'col1': [1, 9, None, 9, 20, None]}
>>> df = pd.DataFrame(data=d)
>>> config = TSCC.preprocessing.Config(colname_raw = 'col1')
>>> test = TSCC.correction.BASE_NA_byMode(df, 10,config)
>>> test
0     1.0
1     9.0
2     9.0
3     9.0
4    20.0
5     9.0
Name: col1, dtype: float64

As expected the missing values are being filled with 9, because it is the most common mode with two times.

NA in space#

With this function you can impute the missing values with the mean of other columns in your dataframe. Here’s an example of how to use the ‘BASE_NA_inSpace’ function:

Replacing with mean#
>>> data = {
>>>     "val_raw": [1, None, np.nan, 1, None],
>>>     "isError": [False, True, True, False, True],
>>>     'N': [2, 2, 2, 2, 2],
>>>     'E': [3, 3, None, 3, 3],
>>> }
>>> df_fea = pd.DataFrame(data)
>>> config = TSCC.preprocessing.Config(colname_raw = 'val_raw')
>>> BASE_NA_inSpace(df_fea, None, config)
0    1.000000
1    2.500000
2    2.000000
3    1.000000
4    2.500000
Name: val_raw, dtype: float64

As expected the missing values of ‘val_raw’ are being imputed with the mean of ‘N’ and ‘E’.