Machine learning error correction#

class TSCC.correction.ml.ML_byGAIN_DW[source]#

This class implements a model for data imputation using the GAIN (Generative Adversarial Imputation Nets) framework by jsyoon0823/GAIN. The model is designed to fill in missing values in datasets by leveraging a deep learning approach.

Right now only fit_transform part, therefore also newly trained model for test dataset.

Parameters:
gain_parameters: dict, optional

A dictionary containing the parameters for the GAIN model, including ‘batch_size’, ‘hint_rate’, ‘alpha’, and ‘iterations’. Default parameters will be used if not provided.

alpha: float, default=1.0

The weight for the dense loss function, influencing the training dynamics.

exclude_cols: list, optional

A list of columns to exclude from the training process. Not implemented in the current version.

Methods

load_model(path)

Load the model from a file.

save_model(path)

Save the model to a file.

fit

predict

__init__(gain_parameters=None, alpha=1.0, exclude_cols=None)[source]#
load_model(path)[source]#

Load the model from a file.

Parameters:
pathstr

The path to the model file.

save_model(path)[source]#

Save the model to a file.

Parameters:
pathstr

The path to save the model file.

class TSCC.correction.ml.ML_byMissForest[source]#

This class implements the MissForest algorithm for imputation of missing values using Random Forests. It is designed to handle datasets with missing entries, providing a mechanism to fill in these gaps effectively.

Parameters:
max_featuresint, default=100

The maximum number of features to consider when looking for the best split at each node in the forest. This can help in controlling overfitting and improving computational efficiency.

Methods

fit(df_fea, df_tar, df_flag_erroneous, config)

Fits the MissForest model to the training data for imputation of missing values.

load_model(path)

Load the model from a file.

predict(df_fea, df_flag_erroneous, config)

Imputes missing values in the input feature DataFrame using the fitted MissForest model.

save_model(path)

Save the model to a file.

__init__(max_features=100)[source]#

Initializes the ML_byRF class with a RandomForestClassifier model.

Parameters:
- n_estimators: int, default=100

The number of trees in the forest.

- random_state: int, RandomState instance or None, default=None

Controls the randomness of the estimator.

fit(df_fea, df_tar, df_flag_erroneous, config)[source]#

Fits the MissForest model to the training data for imputation of missing values.

This method prepares the feature DataFrame for training the MissForest model. It replaces missing values based on the specified criteria and fits the model to the non-empty feature columns.

Parameters:
df_feapandas DataFrame

The input features with missing values that need to be imputed.

df_tarpandas DataFrame

The target values (not used in MissForest but can be included for consistency).

df_flag_erroneouspandas Series

A boolean Series indicating which rows contain erroneous or missing values.

configobject

Configuration object containing information about the dataset, including the name of the column to be predicted.

Returns:
None
load_model(path)[source]#

Load the model from a file.

Parameters:
pathstr

The path to the model file.

predict(df_fea, df_flag_erroneous, config)[source]#

Imputes missing values in the input feature DataFrame using the fitted MissForest model.

This method predicts values for the specified column in the DataFrame where the original values are missing (indicated by the df_flag_erroneous). It retains the original values where valid and fills in NaN where the values are flagged as erroneous.

Parameters:
df_feapandas DataFrame

The input features with missing values to be imputed.

df_flag_erroneousString

A String referring to a column of df_fea with a boolean Series indicating which rows contain erroneous or missing values.

configobject

Configuration object that contains information about the dataset, including the name of the column to be predicted.

Returns:
pandas Series

A Series containing the predicted values for the specified column, with NaN where the original values were not valid.

save_model(path)[source]#

Save the model to a file.

Parameters:
pathstr

The path to save the model file.

class TSCC.correction.ml.ML_byRF[source]#

This class implements a Random Forest model for regression tasks using the RandomForestRegressor from scikit-learn.

Parameters:
n_estimatorsint, default=100

The number of trees in the forest, influencing the model’s complexity and performance.

random_stateint, RandomState instance or None, default=None

Controls the randomness of the estimator for reproducibility.

max_depthint, default=15

The maximum depth of the trees. It helps prevent overfitting by limiting how deep the trees can grow.

Methods

fit(df_fea, df_tar, df_flag_erroneous, config)

Fits the RandomForestClassifier model to the data.

load_model(path)

Load the model from a file.

predict(df_fea, df_flag_erroneous, config)

Predict class for X.

save_model(path)

Save the model to a file.

__init__(n_estimators=100, random_state=None, max_depth=15)[source]#

Initializes the ML_byRF class with a RandomForestClassifier model.

Parameters:
n_estimatorsint, default=100

The number of trees in the forest.

random_stateint, RandomState instance or None, default=None

Controls the randomness of the estimator.

fit(df_fea, df_tar, df_flag_erroneous, config)[source]#

Fits the RandomForestClassifier model to the data. Function does not support NaN values.

Parameters:
df_feapandas dataframe

The training input samples.

df_tarpandas dataframe

The target values (class labels).

Returns:
decision_clfRandomForestClassifier object
load_model(path)[source]#

Load the model from a file.

Parameters:
pathstr

The path to the model file.

predict(df_fea, df_flag_erroneous, config)[source]#

Predict class for X. Function does not support NaN values.

Parameters:
Xarray-like or sparse matrix of shape (n_samples, n_features)

The input samples.

Returns:
y_predarray of shape (n_samples,)

The predicted classes.

save_model(path)[source]#

Save the model to a file.

Parameters:
pathstr

The path to save the model file.