Experiment

Bases: BaseExperiment

Concrete implementation for performing ML experiments and evaluation.

This class extends BaseExperiment, providing methods for evaluating machine learning models using holdout or cross-validation strategies. It performs hyperparameter tuning, final model training, and evaluation based on specified tuning and optimization methods.

Inherits

BaseExperiment: Provides core functionality for validation, resampling, training, and tuning configurations.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	The preloaded data for the experiment.	required
`task`	`str`	The task name used to determine classification type. Can be 'pocketclosure', 'pocketclosureinf', 'improvement', or 'pdgrouprevaluation'.	required
`learner`	`str`	Specifies the model or algorithm to evaluate. Includes 'xgb', 'rf', 'lr' or 'mlp'.	required
`criterion`	`str`	Criterion for optimization ('f1', 'macro_f1' or 'brier_score').	required
`encoding`	`str`	Encoding type for categorical features ('one_hot' or 'binary').	required
`tuning`	`Optional[str]`	Tuning method to apply ('holdout' or 'cv'). Can be None.	required
`hpo`	`Optional[str]`	Hyperparameter optimization method ('rs' or 'hebo'). Can be None.	required
`sampling`	`Optional[str]`	Resampling strategy to apply. Defaults to None. Includes None, 'upsampling', 'downsampling', and 'smote'.	`None`
`factor`	`Optional[float]`	Resampling factor. Defaults to None.	`None`
`n_configs`	`int`	Number of configurations for hyperparameter tuning. Defaults to 10.	`10`
`racing_folds`	`Optional[int]`	Number of racing folds for Random Search (RS). Defaults to None.	`None`
`n_jobs`	`int`	Number of parallel jobs to run for evaluation. Defaults to 1.	`1`
`cv_folds`	`Optional[int]`	Number of folds for cross-validation; Defaults to 10.	`10`
`test_seed`	`int`	Random seed for test splitting. Defaults to 0.	`0`
`test_size`	`float`	Proportion of data used for testing. Defaults to 0.2.	`0.2`
`val_size`	`Optional[float]`	Size of validation set in holdout tuning. Defaults to 0.2.	`0.2`
`cv_seed`	`Optional[int]`	Random seed for cross-validation. Defaults to 0	`0`
`mlp_flag`	`Optional[bool]`	Flag to enable MLP training with early stopping. Defaults to None.	`None`
`threshold_tuning`	`Optional[bool]`	If True, performs threshold tuning for binary classification if the criterion is "f1". Defaults to None.	`None`
`verbose`	`bool`	Enables verbose output if set to True.	`True`

Attributes:

Name	Type	Description
`df`	`DataFrame`	Dataset used for training and evaluation.
`task`	`str`	Name of the task used to determine the classification type.
`learner`	`str`	Model or algorithm name for the experiment.
`criterion`	`str`	Criterion for performance evaluation.
`encoding`	`str`	Encoding type for categorical features.
`sampling`	`str`	Resampling method used in training.
`factor`	`float`	Factor applied during resampling.
`n_configs`	`int`	Number of configurations evaluated in hyperparameter tuning.
`racing_folds`	`int`	Number of racing folds for random search.
`n_jobs`	`int`	Number of parallel jobs used during processing.
`cv_folds`	`int`	Number of cross-validation folds.
`test_seed`	`int`	Seed for reproducible test splitting.
`test_size`	`float`	Proportion of data reserved for testing.
`val_size`	`float`	Size of the validation set in holdout tuning.
`cv_seed`	`int`	Seed for reproducible cross-validation splits.
`mlp_flag`	`bool`	Indicates if MLP training with early stopping is enabled.
`threshold_tuning`	`bool`	Enables threshold tuning for binary classification.
`verbose`	`bool`	Controls detailed output during the experiment.
`resampler`	`Resampler`	Resampler instance for data handling.
`trainer`	`Trainer`	Trainer instance for model training and evaluation.
`tuner`	`Tuner`	Initialized tuner for hyperparameter optimization.

Methods:

Name	Description
`perform_evaluation`	Conducts evaluation based on the tuning method.

Example

from periomod.benchmarking import Experiment
from periomod.data import ProcessedDataLoader

# Load a dataframe with the correct target and encoding selected
dataloader = ProcessedDataLoader(task="pocketclosure", encoding="one_hot")
df = dataloader.load_data(path="data/processed/processed_data.csv")
df = dataloader.transform_data(df=df)

experiment = Experiment(
    df=df,
    task="pocketclosure",
    learner="rf",
    criterion="f1",
    encoding="one_hot",
    tuning="cv",
    hpo="rs",
    sampling="upsample",
    factor=1.5,
    n_configs=20,
    racing_folds=5,
)

# Perform the evaluation based on cross-validation
final_metrics = experiment.perform_evaluation()
print(final_metrics)

Source code in periomod/benchmarking/_benchmark.py

class Experiment(BaseExperiment):
    """Concrete implementation for performing ML experiments and evaluation.

    This class extends `BaseExperiment`, providing methods for evaluating machine
    learning models using holdout or cross-validation strategies. It performs
    hyperparameter tuning, final model training, and evaluation based on
    specified tuning and optimization methods.

    Inherits:
        `BaseExperiment`: Provides core functionality for validation, resampling,
            training, and tuning configurations.

    Args:
        df (pd.DataFrame): The preloaded data for the experiment.
        task (str): The task name used to determine classification type.
            Can be 'pocketclosure', 'pocketclosureinf', 'improvement', or
            'pdgrouprevaluation'.
        learner (str): Specifies the model or algorithm to evaluate.
            Includes 'xgb', 'rf', 'lr' or 'mlp'.
        criterion (str): Criterion for optimization ('f1', 'macro_f1' or 'brier_score').
        encoding (str): Encoding type for categorical features ('one_hot' or 'binary').
        tuning (Optional[str]): Tuning method to apply ('holdout' or 'cv'). Can be None.
        hpo (Optional[str]): Hyperparameter optimization method ('rs' or 'hebo').
            Can be None.
        sampling (Optional[str]): Resampling strategy to apply. Defaults to None.
            Includes None, 'upsampling', 'downsampling', and 'smote'.
        factor (Optional[float]): Resampling factor. Defaults to None.
        n_configs (int): Number of configurations for hyperparameter tuning.
            Defaults to 10.
        racing_folds (Optional[int]): Number of racing folds for Random Search (RS).
            Defaults to None.
        n_jobs (int): Number of parallel jobs to run for evaluation.
            Defaults to 1.
        cv_folds (Optional[int]): Number of folds for cross-validation;
            Defaults to 10.
        test_seed (int): Random seed for test splitting. Defaults to 0.
        test_size (float): Proportion of data used for testing. Defaults to
            0.2.
        val_size (Optional[float]): Size of validation set in holdout tuning.
            Defaults to 0.2.
        cv_seed (Optional[int]): Random seed for cross-validation. Defaults to 0
        mlp_flag (Optional[bool]): Flag to enable MLP training with early stopping.
            Defaults to None.
        threshold_tuning (Optional[bool]): If True, performs threshold tuning for binary
            classification if the criterion is "f1". Defaults to None.
        verbose (bool): Enables verbose output if set to True.

    Attributes:
        df (pd.DataFrame): Dataset used for training and evaluation.
        task (str): Name of the task used to determine the classification type.
        learner (str): Model or algorithm name for the experiment.
        criterion (str): Criterion for performance evaluation.
        encoding (str): Encoding type for categorical features.
        sampling (str): Resampling method used in training.
        factor (float): Factor applied during resampling.
        n_configs (int): Number of configurations evaluated in hyperparameter tuning.
        racing_folds (int): Number of racing folds for random search.
        n_jobs (int): Number of parallel jobs used during processing.
        cv_folds (int): Number of cross-validation folds.
        test_seed (int): Seed for reproducible test splitting.
        test_size (float): Proportion of data reserved for testing.
        val_size (float): Size of the validation set in holdout tuning.
        cv_seed (int): Seed for reproducible cross-validation splits.
        mlp_flag (bool): Indicates if MLP training with early stopping is enabled.
        threshold_tuning (bool): Enables threshold tuning for binary classification.
        verbose (bool): Controls detailed output during the experiment.
        resampler (Resampler): Resampler instance for data handling.
        trainer (Trainer): Trainer instance for model training and evaluation.
        tuner (Tuner): Initialized tuner for hyperparameter optimization.

    Methods:
        perform_evaluation: Conducts evaluation based on the tuning method.

    Example:
        ```
        from periomod.benchmarking import Experiment
        from periomod.data import ProcessedDataLoader

        # Load a dataframe with the correct target and encoding selected
        dataloader = ProcessedDataLoader(task="pocketclosure", encoding="one_hot")
        df = dataloader.load_data(path="data/processed/processed_data.csv")
        df = dataloader.transform_data(df=df)

        experiment = Experiment(
            df=df,
            task="pocketclosure",
            learner="rf",
            criterion="f1",
            encoding="one_hot",
            tuning="cv",
            hpo="rs",
            sampling="upsample",
            factor=1.5,
            n_configs=20,
            racing_folds=5,
        )

        # Perform the evaluation based on cross-validation
        final_metrics = experiment.perform_evaluation()
        print(final_metrics)
        ```
    """

    def __init__(
        self,
        df: pd.DataFrame,
        task: str,
        learner: str,
        criterion: str,
        encoding: str,
        tuning: Optional[str],
        hpo: Optional[str],
        sampling: Optional[str] = None,
        factor: Optional[float] = None,
        n_configs: int = 10,
        racing_folds: Optional[int] = None,
        n_jobs: int = 1,
        cv_folds: Optional[int] = 10,
        test_seed: int = 0,
        test_size: float = 0.2,
        val_size: Optional[float] = 0.2,
        cv_seed: Optional[int] = 0,
        mlp_flag: Optional[bool] = None,
        threshold_tuning: Optional[bool] = None,
        verbose: bool = True,
    ) -> None:
        """Initialize the Experiment class with tuning parameters.

        Args:
            df (pd.DataFrame): The preloaded data for the experiment.
            task (str): The task name used to determine classification type.
                Can be 'pocketclosure', 'pocketclosureinf', 'improvement', or
                'pdgrouprevaluation'.
            learner (str): Specifies the model or algorithm to evaluate.
                Includes 'xgb', 'rf', 'lr' or 'mlp'.
            criterion (str): Criterion for optimization ('f1', 'macro_f1' or
                'brier_score').
            encoding (str): Encoding type for categorical features ('one_hot' or
                'binary').
            tuning (Optional[str]): Tuning method to apply ('holdout' or 'cv').
                Can be None.
            hpo (Optional[str]): Hyperparameter optimization method ('rs' or 'hebo').
                Can be None.
            sampling (Optional[str]): Resampling strategy to apply. Defaults to None.
                Includes None, 'upsampling', 'downsampling', and 'smote'.
            factor (Optional[float]): Resampling factor. Defaults to None.
            n_configs (int): Number of configurations for hyperparameter tuning.
                Defaults to 10.
            racing_folds (Optional[int]): Number of racing folds for Random Search (RS).
                Defaults to None.
            n_jobs (int): Number of parallel jobs to run for evaluation.
                Defaults to 1.
            cv_folds (Optional[int]): Number of folds for cross-validation;
                Defaults to 10.
            test_seed (int): Random seed for test splitting. Defaults to 0.
            test_size (float): Proportion of data used for testing. Defaults to
                0.2.
            val_size (Optional[float]): Size of validation set in holdout tuning.
                Defaults to 0.2.
            cv_seed (Optional[int]): Random seed for cross-validation. Defaults to 0
            mlp_flag (Optional[bool]): Flag to enable MLP training with early stopping.
                Defaults to None.
            threshold_tuning (Optional[bool]): If True, performs threshold tuning for
                binary classification if the criterion is "f1". Defaults to None.
            verbose (bool): Enables verbose output if set to True.
        """
        super().__init__(
            df=df,
            task=task,
            learner=learner,
            criterion=criterion,
            encoding=encoding,
            tuning=tuning,
            hpo=hpo,
            sampling=sampling,
            factor=factor,
            n_configs=n_configs,
            racing_folds=racing_folds,
            n_jobs=n_jobs,
            cv_folds=cv_folds,
            test_seed=test_seed,
            test_size=test_size,
            val_size=val_size,
            cv_seed=cv_seed,
            mlp_flag=mlp_flag,
            threshold_tuning=threshold_tuning,
            verbose=verbose,
        )

    def perform_evaluation(self) -> dict:
        """Perform model evaluation and return final metrics.

        Returns:
            dict: A dictionary containing the trained model and its evaluation metrics.
        """
        train_df, _ = self.resampler.split_train_test_df(
            df=self.df, seed=self.test_seed, test_size=self.test_size
        )

        if self.tuning == "holdout":
            return self._evaluate_holdout(train_df=train_df)
        elif self.tuning == "cv":
            return self._evaluate_cv()
        else:
            raise ValueError(f"Unsupported tuning method: {self.tuning}")

    def _evaluate_holdout(self, train_df: pd.DataFrame) -> dict:
        """Perform holdout validation and return the final model metrics.

        Args:
            train_df (pd.DataFrame): train df for holdout tuning.

        Returns:
            dict: A dictionary of evaluation metrics for the final model.
        """
        train_df_h, test_df_h = self.resampler.split_train_test_df(
            df=train_df, seed=self.test_seed, test_size=self.val_size
        )
        X_train_h, y_train_h, X_val, y_val = self.resampler.split_x_y(
            train_df=train_df_h,
            test_df=test_df_h,
            sampling=self.sampling,
            factor=self.factor,
        )
        best_params, best_threshold = self.tuner.holdout(
            learner=self.learner,
            X_train=X_train_h,
            y_train=y_train_h,
            X_val=X_val,
            y_val=y_val,
        )
        final_model = (self.learner, best_params, best_threshold)

        return self._train_final_model(final_model)

    def _evaluate_cv(self) -> dict:
        """Perform cross-validation and return the final model metrics.

        Returns:
            dict: A dictionary of evaluation metrics for the final model.
        """
        outer_splits, _ = self.resampler.cv_folds(
            df=self.df,
            sampling=self.sampling,
            factor=self.factor,
            seed=self.cv_seed,
            n_folds=self.cv_folds,
        )
        best_params, best_threshold = self.tuner.cv(
            learner=self.learner,
            outer_splits=outer_splits,
            racing_folds=self.racing_folds,
        )
        final_model = (self.learner, best_params, best_threshold)

        return self._train_final_model(final_model_tuple=final_model)

`init(df, task, learner, criterion, encoding, tuning, hpo, sampling=None, factor=None, n_configs=10, racing_folds=None, n_jobs=1, cv_folds=10, test_seed=0, test_size=0.2, val_size=0.2, cv_seed=0, mlp_flag=None, threshold_tuning=None, verbose=True)` ¶

Initialize the Experiment class with tuning parameters.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	The preloaded data for the experiment.	required
`task`	`str`	The task name used to determine classification type. Can be 'pocketclosure', 'pocketclosureinf', 'improvement', or 'pdgrouprevaluation'.	required
`learner`	`str`	Specifies the model or algorithm to evaluate. Includes 'xgb', 'rf', 'lr' or 'mlp'.	required
`criterion`	`str`	Criterion for optimization ('f1', 'macro_f1' or 'brier_score').	required
`encoding`	`str`	Encoding type for categorical features ('one_hot' or 'binary').	required
`tuning`	`Optional[str]`	Tuning method to apply ('holdout' or 'cv'). Can be None.	required
`hpo`	`Optional[str]`	Hyperparameter optimization method ('rs' or 'hebo'). Can be None.	required
`sampling`	`Optional[str]`	Resampling strategy to apply. Defaults to None. Includes None, 'upsampling', 'downsampling', and 'smote'.	`None`
`factor`	`Optional[float]`	Resampling factor. Defaults to None.	`None`
`n_configs`	`int`	Number of configurations for hyperparameter tuning. Defaults to 10.	`10`
`racing_folds`	`Optional[int]`	Number of racing folds for Random Search (RS). Defaults to None.	`None`
`n_jobs`	`int`	Number of parallel jobs to run for evaluation. Defaults to 1.	`1`
`cv_folds`	`Optional[int]`	Number of folds for cross-validation; Defaults to 10.	`10`
`test_seed`	`int`	Random seed for test splitting. Defaults to 0.	`0`
`test_size`	`float`	Proportion of data used for testing. Defaults to 0.2.	`0.2`
`val_size`	`Optional[float]`	Size of validation set in holdout tuning. Defaults to 0.2.	`0.2`
`cv_seed`	`Optional[int]`	Random seed for cross-validation. Defaults to 0	`0`
`mlp_flag`	`Optional[bool]`	Flag to enable MLP training with early stopping. Defaults to None.	`None`
`threshold_tuning`	`Optional[bool]`	If True, performs threshold tuning for binary classification if the criterion is "f1". Defaults to None.	`None`
`verbose`	`bool`	Enables verbose output if set to True.	`True`

Source code in periomod/benchmarking/_benchmark.py

def __init__(
    self,
    df: pd.DataFrame,
    task: str,
    learner: str,
    criterion: str,
    encoding: str,
    tuning: Optional[str],
    hpo: Optional[str],
    sampling: Optional[str] = None,
    factor: Optional[float] = None,
    n_configs: int = 10,
    racing_folds: Optional[int] = None,
    n_jobs: int = 1,
    cv_folds: Optional[int] = 10,
    test_seed: int = 0,
    test_size: float = 0.2,
    val_size: Optional[float] = 0.2,
    cv_seed: Optional[int] = 0,
    mlp_flag: Optional[bool] = None,
    threshold_tuning: Optional[bool] = None,
    verbose: bool = True,
) -> None:
    """Initialize the Experiment class with tuning parameters.

    Args:
        df (pd.DataFrame): The preloaded data for the experiment.
        task (str): The task name used to determine classification type.
            Can be 'pocketclosure', 'pocketclosureinf', 'improvement', or
            'pdgrouprevaluation'.
        learner (str): Specifies the model or algorithm to evaluate.
            Includes 'xgb', 'rf', 'lr' or 'mlp'.
        criterion (str): Criterion for optimization ('f1', 'macro_f1' or
            'brier_score').
        encoding (str): Encoding type for categorical features ('one_hot' or
            'binary').
        tuning (Optional[str]): Tuning method to apply ('holdout' or 'cv').
            Can be None.
        hpo (Optional[str]): Hyperparameter optimization method ('rs' or 'hebo').
            Can be None.
        sampling (Optional[str]): Resampling strategy to apply. Defaults to None.
            Includes None, 'upsampling', 'downsampling', and 'smote'.
        factor (Optional[float]): Resampling factor. Defaults to None.
        n_configs (int): Number of configurations for hyperparameter tuning.
            Defaults to 10.
        racing_folds (Optional[int]): Number of racing folds for Random Search (RS).
            Defaults to None.
        n_jobs (int): Number of parallel jobs to run for evaluation.
            Defaults to 1.
        cv_folds (Optional[int]): Number of folds for cross-validation;
            Defaults to 10.
        test_seed (int): Random seed for test splitting. Defaults to 0.
        test_size (float): Proportion of data used for testing. Defaults to
            0.2.
        val_size (Optional[float]): Size of validation set in holdout tuning.
            Defaults to 0.2.
        cv_seed (Optional[int]): Random seed for cross-validation. Defaults to 0
        mlp_flag (Optional[bool]): Flag to enable MLP training with early stopping.
            Defaults to None.
        threshold_tuning (Optional[bool]): If True, performs threshold tuning for
            binary classification if the criterion is "f1". Defaults to None.
        verbose (bool): Enables verbose output if set to True.
    """
    super().__init__(
        df=df,
        task=task,
        learner=learner,
        criterion=criterion,
        encoding=encoding,
        tuning=tuning,
        hpo=hpo,
        sampling=sampling,
        factor=factor,
        n_configs=n_configs,
        racing_folds=racing_folds,
        n_jobs=n_jobs,
        cv_folds=cv_folds,
        test_seed=test_seed,
        test_size=test_size,
        val_size=val_size,
        cv_seed=cv_seed,
        mlp_flag=mlp_flag,
        threshold_tuning=threshold_tuning,
        verbose=verbose,
    )

`perform_evaluation()` ¶

Perform model evaluation and return final metrics.

Returns:

Name	Type	Description
`dict`	`dict`	A dictionary containing the trained model and its evaluation metrics.

Source code in periomod/benchmarking/_benchmark.py

def perform_evaluation(self) -> dict:
    """Perform model evaluation and return final metrics.

    Returns:
        dict: A dictionary containing the trained model and its evaluation metrics.
    """
    train_df, _ = self.resampler.split_train_test_df(
        df=self.df, seed=self.test_seed, test_size=self.test_size
    )

    if self.tuning == "holdout":
        return self._evaluate_holdout(train_df=train_df)
    elif self.tuning == "cv":
        return self._evaluate_cv()
    else:
        raise ValueError(f"Unsupported tuning method: {self.tuning}")

Experiment

__init__(df, task, learner, criterion, encoding, tuning, hpo, sampling=None, factor=None, n_configs=10, racing_folds=None, n_jobs=1, cv_folds=10, test_seed=0, test_size=0.2, val_size=0.2, cv_seed=0, mlp_flag=None, threshold_tuning=None, verbose=True) ¶

perform_evaluation() ¶

`init(df, task, learner, criterion, encoding, tuning, hpo, sampling=None, factor=None, n_configs=10, racing_folds=None, n_jobs=1, cv_folds=10, test_seed=0, test_size=0.2, val_size=0.2, cv_seed=0, mlp_flag=None, threshold_tuning=None, verbose=True)` ¶

`perform_evaluation()` ¶