Feature generation methods#
- TSCC.preprocessing.feature_generation.feature_generation_heavyRainDWD(df, selected_columns=None)[source]#
Generates heavy rain classification features based on rainfall thresholds defined by the DWD (German Weather Service).
- Parameters:
- dfpandas DataFrame
The input dataframe containing rainfall data.
- selected_columnslist, optional
List of columns to apply the heavy rain feature generation on. If None, all columns will be considered.
- Returns:
- dfpandas DataFrame
The dataframe with additional boolean columns indicating different levels of heavy rainfall: - “{col}_isHeavyRain”: True if the rainfall in the past 1 hour is between 15-25 mm, or between 20-35 mm in the past 6 hours. - “{col}_isHeavyHeavyRain”: True if the rainfall in the past 1 hour is between 25-40 mm, or between 35-60 mm in the past 6 hours. - “{col}_isExtremeHeavyRain”: True if the rainfall exceeds 40 mm in the past 1 hour, or 60 mm in the past 6 hours.
- TSCC.preprocessing.feature_generation.feature_generation_prevObs(df, selected_columns=None, timestep_name=5, agg_numobs=12, agg_name='h')[source]#
Generates features based on previous observations and rolling statistics.
- Parameters:
- dfpandas DataFrame
The input dataframe where previous observation and aggregation features will be added.
- selected_columnslist, optional
List of specific columns for which features will be generated. If None, all columns in the dataframe are used.
- timestep_nameint, default=5
Time interval (in terms of index steps) used for creating previous observation features.
- agg_numobsint, default=12
The window size for calculating rolling statistics such as min, max, mean, and sum.
- agg_namestr, default=”h”
Suffix used in the naming of the generated aggregation columns.
- Returns:
- df_defragmentedpandas DataFrame
The dataframe with additional features generated based on previous observations and rolling window statistics.
- TSCC.preprocessing.feature_generation.feature_generation_time(df)[source]#
Generates time-based features from the index of the dataframe.
- Parameters:
- dfpandas DataFrame
The input dataframe with a datetime index from which time-related features will be generated.
- Returns:
- dfpandas DataFrame
The dataframe with additional time-based features: - “month”: Extracted month from the datetime index. - “weekday”: Extracted day of the week (0 = Monday, 6 = Sunday) from the datetime index. - “hour”: Extracted hour of the day from the datetime index.
- TSCC.preprocessing.feature_generation.feature_generation_uncertainty(df, uncertainty_dict)[source]#
Generates uncertainty-related features and adds them to the provided dataframe.
- Parameters:
- dfpandas DataFrame
The input dataframe to which uncertainty features will be added.
- uncertainty_dictdict
A dictionary where keys are feature names and values are the corresponding uncertainty of sensors to be added to the dataframe.
- Returns:
- dfpandas DataFrame
The dataframe with added uncertainty features based on the provided uncertainty dictionary.
- TSCC.preprocessing.feature_generation.feature_generation_wrapper(df, selected_columns=None)[source]#
Combines various feature generation functions to augment the input dataframe with new features related to previous observations, seasonality, and heavy rainfall.
- Parameters:
- dfpandas DataFrame
The input dataframe containing time series or sensor data.
- selected_columnslist, optional
List of columns to apply the feature generation on. If None, all columns will be considered.
- Returns:
- dfpandas DataFrame
The dataframe augmented with additional features, including: - Previous observations and deltas using feature_generation_prevObs. - Seasonality-related features using feature_generation_seasonality. - Heavy rainfall indicators using feature_generation_heavyRainDWD.