Feature generation methods#

TSCC.preprocessing.feature_generation.feature_generation_heavyRainDWD(df, selected_columns=None)[source]#

Generates heavy rain classification features based on rainfall thresholds defined by the DWD (German Weather Service).

Parameters:
dfpandas DataFrame

The input dataframe containing rainfall data.

selected_columnslist, optional

List of columns to apply the heavy rain feature generation on. If None, all columns will be considered.

Returns:
dfpandas DataFrame

The dataframe with additional boolean columns indicating different levels of heavy rainfall: - “{col}_isHeavyRain”: True if the rainfall in the past 1 hour is between 15-25 mm, or between 20-35 mm in the past 6 hours. - “{col}_isHeavyHeavyRain”: True if the rainfall in the past 1 hour is between 25-40 mm, or between 35-60 mm in the past 6 hours. - “{col}_isExtremeHeavyRain”: True if the rainfall exceeds 40 mm in the past 1 hour, or 60 mm in the past 6 hours.

TSCC.preprocessing.feature_generation.feature_generation_prevObs(df, selected_columns=None, timestep_name=5, agg_numobs=12, agg_name='h')[source]#

Generates features based on previous observations and rolling statistics.

Parameters:
dfpandas DataFrame

The input dataframe where previous observation and aggregation features will be added.

selected_columnslist, optional

List of specific columns for which features will be generated. If None, all columns in the dataframe are used.

timestep_nameint, default=5

Time interval (in terms of index steps) used for creating previous observation features.

agg_numobsint, default=12

The window size for calculating rolling statistics such as min, max, mean, and sum.

agg_namestr, default=”h”

Suffix used in the naming of the generated aggregation columns.

Returns:
df_defragmentedpandas DataFrame

The dataframe with additional features generated based on previous observations and rolling window statistics.

TSCC.preprocessing.feature_generation.feature_generation_time(df)[source]#

Generates time-based features from the index of the dataframe.

Parameters:
dfpandas DataFrame

The input dataframe with a datetime index from which time-related features will be generated.

Returns:
dfpandas DataFrame

The dataframe with additional time-based features: - “month”: Extracted month from the datetime index. - “weekday”: Extracted day of the week (0 = Monday, 6 = Sunday) from the datetime index. - “hour”: Extracted hour of the day from the datetime index.

TSCC.preprocessing.feature_generation.feature_generation_uncertainty(df, uncertainty_dict)[source]#

Generates uncertainty-related features and adds them to the provided dataframe.

Parameters:
dfpandas DataFrame

The input dataframe to which uncertainty features will be added.

uncertainty_dictdict

A dictionary where keys are feature names and values are the corresponding uncertainty of sensors to be added to the dataframe.

Returns:
dfpandas DataFrame

The dataframe with added uncertainty features based on the provided uncertainty dictionary.

TSCC.preprocessing.feature_generation.feature_generation_wrapper(df, selected_columns=None)[source]#

Combines various feature generation functions to augment the input dataframe with new features related to previous observations, seasonality, and heavy rainfall.

Parameters:
dfpandas DataFrame

The input dataframe containing time series or sensor data.

selected_columnslist, optional

List of columns to apply the feature generation on. If None, all columns will be considered.

Returns:
dfpandas DataFrame

The dataframe augmented with additional features, including: - Previous observations and deltas using feature_generation_prevObs. - Seasonality-related features using feature_generation_seasonality. - Heavy rainfall indicators using feature_generation_heavyRainDWD.