Baseline error correction#

TSCC.correction.baseline.BASE_NA_byMode(df_fea, df_tar, config)[source]#

Replace your None values with the most common number in your dataframe.

Parameters:
df_feapandas dataframe

Your selected dataframe with None values.

df_tarNone

Not used yet.

configobject

Name of the column to be changed.

Returns:
pandas series

Your selected dataframe without None values.

Examples

>>> d = {'col1': [1, 9, None, 6, 20, None, 3, 1, 30, None]}
>>> df = pd.DataFrame(data=d)
>>> config = TSCC.preprocessing.Config(colname_raw = 'col1')
>>> test = TSCC.correction.BASE_NA_byMode(df, 10,config)
>>> test
0     1.0
1     9.0
2     1.0
3     6.0
4    20.0
5     1.0
6     3.0
7     1.0
8    30.0
9     1.0
Name: col1, dtype: float64
TSCC.correction.baseline.BASE_NA_inSpace(df_fea, df_tar, config, cols_in_space=['N', 'E', 'S', 'W'])[source]#

Replace your None values with the mean value of your columns cols_in_space, e.g. “N, E, S, W”.

Parameters:
df_feapandas dataframe

Your selected dataframe with columns “N, E, S, W”. The names of these four columns HAVE to be there.

df_tarNone

Not used for now.

configobject

Name of the column to be changed.

Returns:
pandas series

Your selected dataframe without None values in the selected column.

Examples

>>> data = {
>>>     "val_raw": [1, None, np.nan, 1, None],
>>>     "isError": [False, True, True, False, True],
>>>     'N': [2, 2, 2, 2, 2],
>>>     'E': [3, 3, None, 3, 3],
>>>     'S': [4, None, 4, 4, 4],
>>>     'W': [None, 5, 5, 5, 5]
>>> }
>>> df_fea = pd.DataFrame(data)
>>> config = TSCC.preprocessing.Config(colname_raw = 'val_raw')
>>> TSCC.correction.BASE_NA_withSpecFeature(df_fea, None, config, "N")
0    1.000000
1    3.333333
2    3.666667
3    1.000000
4    3.500000
Name: val_raw, dtype: float64
TSCC.correction.baseline.BASE_NA_withSpecFeature(df_fea, df_tar, config, feature)[source]#

Fill your missing values of a specific column using values form another column in your dataframe.

Parameters:
df_feapandas dataframe

Your specific column with missing values.

df_tarNone

Not used for now.

configobject

Contains the name of your column with missing values.

featurestr or list of str

The name of the column to use for replacing the missing values.

Returns:
pandas series

Your new series with replaced values if the name of your feature column exists.

Examples
>>> data = {
    ..
>>>     "val_raw": [1, None, np.nan, 1, None],
    ..
>>>     "isError": [False, True, True, False, True],
    ..
>>>     'N': [2, 2, 2, 2, 2],
    ..
>>>     'E': [3, 3, None, 3, 3],
    ..
>>>     'S': [4, None, 4, 4, 4],
    ..
>>>     'W': [None, 5, 5, 5, 5]
    ..
>>> }
    ..
>>> df_fea = pd.DataFrame(data)
    ..
>>> config = TSCC.preprocessing.Config(colname_raw = 'val_raw')
    ..
>>> TSCC.correction.BASE_NA_withSpecFeature(df_fea, None, config, "N")
    ..
0 1.0
1 2.0
2 2.0
3 1.0
4 2.0
Name: val_raw, dtype: float64
TSCC.correction.baseline.BASE_useDifferentFeature_NA_ByMode(df_fea, df_tar, config, feature)[source]#

Fills missing values (NaN) in a specified feature column using the most frequent value (mode). If the feature column does not exist, it returns a placeholder series filled with NaN.

Parameters:
df_feapandas dataframe

The DataFrame containing the feature columns.

df_tarNone

Not used for now.

configobject

Contains the name of your column with missing values.

featurestr or list of str

The name (or list of names) of the column(s) to fill missing values in.

Returns:
pandas series

The column(s) from df_fea with missing values filled by mode and then 0. If the feature column does not exist, returns a series of NaN values with the same index as df_fea.

Examples

>>> data = {
>>>     "val_raw": [1, None, np.nan, 1, None],
>>>     "isError": [False, True, True, False, True],
>>>     'N': [2, 2, 2, 2, 2],
>>>     'E': [3, 3, None, 3, 3],
>>>     'S': [4, None, 4, 4, 4],
>>>     'W': [None, 1, 2, 3, 4]
>>> }
>>> df_fea = pd.DataFrame(data)
>>> config = TSCC.preprocessing.Config(colname_raw = 'val_raw')
>>> TSCC.correction.BASE_useDifferentFeature_NA_ByMode(df_fea, None, config, "W")
0    1.0
1    1.0
2    2.0
3    3.0
4    4.0
Name: W, dtype: float64