Baseline error correction#
- TSCC.correction.baseline.BASE_NA_byMode(df_fea, df_tar, config)[source]#
Replace your None values with the most common number in your dataframe.
- Parameters:
- df_feapandas dataframe
Your selected dataframe with None values.
- df_tarNone
Not used yet.
- configobject
Name of the column to be changed.
- Returns:
- pandas series
Your selected dataframe without None values.
Examples
>>> d = {'col1': [1, 9, None, 6, 20, None, 3, 1, 30, None]} >>> df = pd.DataFrame(data=d) >>> config = TSCC.preprocessing.Config(colname_raw = 'col1') >>> test = TSCC.correction.BASE_NA_byMode(df, 10,config) >>> test 0 1.0 1 9.0 2 1.0 3 6.0 4 20.0 5 1.0 6 3.0 7 1.0 8 30.0 9 1.0 Name: col1, dtype: float64
- TSCC.correction.baseline.BASE_NA_inSpace(df_fea, df_tar, config, cols_in_space=['N', 'E', 'S', 'W'])[source]#
Replace your None values with the mean value of your columns cols_in_space, e.g. “N, E, S, W”.
- Parameters:
- df_feapandas dataframe
Your selected dataframe with columns “N, E, S, W”. The names of these four columns HAVE to be there.
- df_tarNone
Not used for now.
- configobject
Name of the column to be changed.
- Returns:
- pandas series
Your selected dataframe without None values in the selected column.
Examples
>>> data = { >>> "val_raw": [1, None, np.nan, 1, None], >>> "isError": [False, True, True, False, True], >>> 'N': [2, 2, 2, 2, 2], >>> 'E': [3, 3, None, 3, 3], >>> 'S': [4, None, 4, 4, 4], >>> 'W': [None, 5, 5, 5, 5] >>> } >>> df_fea = pd.DataFrame(data) >>> config = TSCC.preprocessing.Config(colname_raw = 'val_raw') >>> TSCC.correction.BASE_NA_withSpecFeature(df_fea, None, config, "N") 0 1.000000 1 3.333333 2 3.666667 3 1.000000 4 3.500000 Name: val_raw, dtype: float64
- TSCC.correction.baseline.BASE_NA_withSpecFeature(df_fea, df_tar, config, feature)[source]#
Fill your missing values of a specific column using values form another column in your dataframe.
- Parameters:
- df_feapandas dataframe
Your specific column with missing values.
- df_tarNone
Not used for now.
- configobject
Contains the name of your column with missing values.
- featurestr or list of str
The name of the column to use for replacing the missing values.
- Returns:
- pandas series
Your new series with replaced values if the name of your feature column exists.
- Examples
>>> data = { ..
>>> "val_raw": [1, None, np.nan, 1, None], ..
>>> "isError": [False, True, True, False, True], ..
>>> 'N': [2, 2, 2, 2, 2], ..
>>> 'E': [3, 3, None, 3, 3], ..
>>> 'S': [4, None, 4, 4, 4], ..
>>> 'W': [None, 5, 5, 5, 5] ..
>>> } ..
>>> df_fea = pd.DataFrame(data) ..
>>> config = TSCC.preprocessing.Config(colname_raw = 'val_raw') ..
>>> TSCC.correction.BASE_NA_withSpecFeature(df_fea, None, config, "N") ..
- 0 1.0
- 1 2.0
- 2 2.0
- 3 1.0
- 4 2.0
- Name: val_raw, dtype: float64
- TSCC.correction.baseline.BASE_useDifferentFeature_NA_ByMode(df_fea, df_tar, config, feature)[source]#
Fills missing values (NaN) in a specified feature column using the most frequent value (mode). If the feature column does not exist, it returns a placeholder series filled with NaN.
- Parameters:
- df_feapandas dataframe
The DataFrame containing the feature columns.
- df_tarNone
Not used for now.
- configobject
Contains the name of your column with missing values.
- featurestr or list of str
The name (or list of names) of the column(s) to fill missing values in.
- Returns:
- pandas series
The column(s) from df_fea with missing values filled by mode and then 0. If the feature column does not exist, returns a series of NaN values with the same index as df_fea.
Examples
>>> data = { >>> "val_raw": [1, None, np.nan, 1, None], >>> "isError": [False, True, True, False, True], >>> 'N': [2, 2, 2, 2, 2], >>> 'E': [3, 3, None, 3, 3], >>> 'S': [4, None, 4, 4, 4], >>> 'W': [None, 1, 2, 3, 4] >>> } >>> df_fea = pd.DataFrame(data) >>> config = TSCC.preprocessing.Config(colname_raw = 'val_raw') >>> TSCC.correction.BASE_useDifferentFeature_NA_ByMode(df_fea, None, config, "W") 0 1.0 1 1.0 2 2.0 3 3.0 4 4.0 Name: W, dtype: float64