Skip to content

BaseExperiment

Bases: BaseValidator, ABC

Base class for experiment workflows with model benchmarking.

This class provides a shared framework for setting up and running experiments with model training, resampling, tuning, and evaluation. It supports configurations for task-specific classification, tuning methods, hyperparameter optimization, and sampling strategies, providing core methods to set up tuning, training, and evaluation for different machine learning tasks.

Inherits
  • BaseValidator: Validates instance-level variables and parameters.
  • ABC: Specifies abstract methods for subclasses to implement.

Parameters:

Name Type Description Default
df DataFrame

The preloaded dataset used for training and evaluation.

required
task str

Task name, used to determine the classification type based on the Can be 'pocketclosure', 'pocketclosureinf', 'improvement', or 'pdgrouprevaluation'.

required
learner str

Specifies the machine learning model or algorithm to use for evaluation, including 'xgb', 'rf', 'lr' or 'mlp'.

required
criterion str

Evaluation criterion for model performance. Options are 'f1' and 'macro_f1' for F1 score and 'brier_score' for Brier Score.

required
encoding str

Encoding type for categorical features. Choose between 'one_hot' or 'target' encoding based on model requirements.

required
tuning Optional[str]

The tuning method to apply during model training, either 'holdout' or 'cv' for cross-validation.

required
hpo Optional[str]

Hyperparameter optimization strategy. Options include 'rs' (Random Search) and 'hebo'.

required
sampling Optional[str]

Sampling strategy to address class imbalance in the dataset. Includes None, 'upsampling', 'downsampling', and 'smote'.

required
factor Optional[float]

Factor used during resampling, specifying the amount of class balancing to apply.

required
n_configs int

Number of configurations to evaluate during hyperparameter tuning, used to limit the search space.

required
racing_folds Optional[int]

Number of racing folds used during random search for efficient hyperparameter optimization.

required
n_jobs int

Number of parallel jobs to use for processing. Set to -1 to use all available cores.

required
cv_folds int

Number of folds for cross-validation.

required
test_seed int

Seed for random train-test split for reproducibility.

required
test_size float

Proportion of data to use for testing.

required
val_size float

Proportion of data to use for validation in a holdout strategy.

required
cv_seed int

Seed for cross-validation splits for reproducibility.

required
mlp_flag Optional[bool]

If True, enables training with a Multi-Layer Perceptron (MLP) with early stopping. Defaults to self.mlp_training.

required
threshold_tuning bool

If True, tunes the decision threshold in binary classification to optimize for f1 score.

required
verbose bool

If True, enables detailed logging of the model training, tuning, and evaluation processes for better traceability.

required

Attributes:

Name Type Description
task str

The task name, used to set the evaluation objective.

classification str

Classification type derived from the task ('binary' or 'multiclass') for configuring the evaluation.

df DataFrame

DataFrame containing the dataset for training, validation, and testing purposes.

learner str

The chosen machine learning model or algorithm for evaluation.

encoding str

Encoding type applied to categorical features, either 'one_hot' or 'target'.

sampling str

Resampling strategy used to address class imbalance in the dataset.

factor float

Resampling factor applied to balance classes as per the chosen sampling strategy.

n_configs int

Number of configurations evaluated during hyperparameter tuning.

racing_folds int

Number of racing folds applied during random search for efficient tuning.

n_jobs int

Number of parallel jobs used for model training and evaluation.

cv_folds int

Number of folds used for cross-validation.

test_seed int

Seed for splitting data into training and test sets, ensuring reproducibility.

test_size float

Proportion of the dataset assigned to the test split.

val_size float

Proportion of the dataset assigned to validation split in holdout validation.

cv_seed int

Seed for cross-validation splits to ensure consistency across runs.

mlp_flag bool

Enables training with a Multi-Layer Perceptron (MLP) and early stopping.

threshold_tuning bool

Enables tuning of the classification threshold in binary classification for optimizing the F1 score.

verbose bool

Controls the verbosity level of the output for detailed logs during training and evaluation.

resampler Resampler

Instance of the Resampler class for handling dataset resampling based on the specified strategy.

trainer Trainer

Instance of the Trainer class for managing the model training process.

tuner Tuner

Instance of the Tuner class used for performing hyperparameter optimization.

Abstract Method
  • perform_evaluation: Abstract method to handle the model evaluation process.
Source code in periomod/benchmarking/_basebenchmark.py
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
class BaseExperiment(BaseValidator, ABC):
    """Base class for experiment workflows with model benchmarking.

    This class provides a shared framework for setting up and running
    experiments with model training, resampling, tuning, and evaluation. It
    supports configurations for task-specific classification, tuning methods,
    hyperparameter optimization, and sampling strategies, providing core methods
    to set up tuning, training, and evaluation for different machine learning
    tasks.

    Inherits:
        - `BaseValidator`: Validates instance-level variables and parameters.
        - `ABC`: Specifies abstract methods for subclasses to implement.

    Args:
        df (pd.DataFrame): The preloaded dataset used for training and evaluation.
        task (str): Task name, used to determine the classification type based on the
            Can be 'pocketclosure', 'pocketclosureinf', 'improvement', or
            'pdgrouprevaluation'.
        learner (str): Specifies the machine learning model or algorithm to use for
            evaluation, including 'xgb', 'rf', 'lr' or 'mlp'.
        criterion (str): Evaluation criterion for model performance. Options are
            'f1' and 'macro_f1' for F1 score and 'brier_score' for Brier Score.
        encoding (str): Encoding type for categorical features. Choose between
            'one_hot' or 'target' encoding based on model requirements.
        tuning (Optional[str]): The tuning method to apply during model training,
            either 'holdout' or 'cv' for cross-validation.
        hpo (Optional[str]): Hyperparameter optimization strategy. Options include
            'rs' (Random Search) and 'hebo'.
        sampling (Optional[str]): Sampling strategy to address class imbalance in
            the dataset. Includes None, 'upsampling', 'downsampling', and 'smote'.
        factor (Optional[float]): Factor used during resampling, specifying the
            amount of class balancing to apply.
        n_configs (int): Number of configurations to evaluate during hyperparameter
            tuning, used to limit the search space.
        racing_folds (Optional[int]): Number of racing folds used during random
            search for efficient hyperparameter optimization.
        n_jobs (int): Number of parallel jobs to use for processing.
            Set to -1 to use all available cores.
        cv_folds (int): Number of folds for cross-validation.
        test_seed (int): Seed for random train-test split for reproducibility.
        test_size (float): Proportion of data to use for testing.
        val_size (float): Proportion of data to use for validation in a
            holdout strategy.
        cv_seed (int): Seed for cross-validation splits for reproducibility.
        mlp_flag (Optional[bool]): If True, enables training with a Multi-Layer
            Perceptron (MLP) with early stopping. Defaults to `self.mlp_training`.
        threshold_tuning (bool): If True, tunes the decision threshold in binary
            classification to optimize for `f1` score.
        verbose (bool): If True, enables detailed logging of the model training,
            tuning, and evaluation processes for better traceability.

    Attributes:
        task (str): The task name, used to set the evaluation objective.
        classification (str): Classification type derived from the task ('binary'
            or 'multiclass') for configuring the evaluation.
        df (pd.DataFrame): DataFrame containing the dataset for training, validation,
            and testing purposes.
        learner (str): The chosen machine learning model or algorithm for evaluation.
        encoding (str): Encoding type applied to categorical features, either
            'one_hot' or 'target'.
        sampling (str): Resampling strategy used to address class imbalance in
            the dataset.
        factor (float): Resampling factor applied to balance classes as per
            the chosen sampling strategy.
        n_configs (int): Number of configurations evaluated during hyperparameter
            tuning.
        racing_folds (int): Number of racing folds applied during random search for
            efficient tuning.
        n_jobs (int): Number of parallel jobs used for model training and evaluation.
        cv_folds (int): Number of folds used for cross-validation.
        test_seed (int): Seed for splitting data into training and test sets,
            ensuring reproducibility.
        test_size (float): Proportion of the dataset assigned to the test split.
        val_size (float): Proportion of the dataset assigned to validation split in
            holdout validation.
        cv_seed (int): Seed for cross-validation splits to ensure consistency across
            runs.
        mlp_flag (bool): Enables training with a Multi-Layer Perceptron (MLP) and
            early stopping.
        threshold_tuning (bool): Enables tuning of the classification threshold
            in binary classification for optimizing the F1 score.
        verbose (bool): Controls the verbosity level of the output for detailed
            logs during training and evaluation.
        resampler (Resampler): Instance of the `Resampler` class for handling
            dataset resampling based on the specified strategy.
        trainer (Trainer): Instance of the `Trainer` class for managing the model
            training process.
        tuner (Tuner): Instance of the `Tuner` class used for performing
            hyperparameter optimization.


    Abstract Method:
        - `perform_evaluation`: Abstract method to handle the model evaluation process.
    """

    def __init__(
        self,
        df: pd.DataFrame,
        task: str,
        learner: str,
        criterion: str,
        encoding: str,
        tuning: Optional[str],
        hpo: Optional[str],
        sampling: Optional[str],
        factor: Optional[float],
        n_configs: int,
        racing_folds: Optional[int],
        n_jobs: int,
        cv_folds: Optional[int],
        test_seed: int,
        test_size: float,
        val_size: Optional[float],
        cv_seed: Optional[int],
        mlp_flag: Optional[bool],
        threshold_tuning: Optional[bool],
        verbose: bool,
    ) -> None:
        """Initialize the Experiment class with tuning parameters."""
        self.task = task
        classification = self._determine_classification()
        super().__init__(
            classification=classification, criterion=criterion, tuning=tuning, hpo=hpo
        )
        self.df = df
        self.learner = learner
        self.encoding = encoding
        self.sampling = sampling
        self.factor = factor
        self.n_configs = n_configs
        self.racing_folds = racing_folds
        self.n_jobs = n_jobs
        self.cv_folds = cv_folds
        self.test_seed = test_seed
        self.test_size = test_size
        self.val_size = val_size
        self.cv_seed = cv_seed
        self.mlp_flag = mlp_flag
        self.threshold_tuning = threshold_tuning
        self.verbose = verbose
        self.resampler = Resampler(self.classification, self.encoding)
        self.trainer = Trainer(
            self.classification,
            self.criterion,
            tuning=self.tuning,
            hpo=self.hpo,
            mlp_training=self.mlp_flag,
            threshold_tuning=self.threshold_tuning,
        )
        self.tuner = self._initialize_tuner()

    def _determine_classification(self) -> str:
        """Determine classification type based on the task name.

        Returns:
            str: The classification type ('binary' or 'multiclass').
        """
        if self.task in ["pocketclosure", "pocketclosureinf", "improvement"]:
            return "binary"
        elif self.task == "pdgrouprevaluation":
            return "multiclass"
        else:
            raise ValueError(
                f"Unknown task: {self.task}. Unable to determine classification."
            )

    def _initialize_tuner(self):
        """Initialize the appropriate tuner based on the hpo method."""
        if self.hpo == "rs":
            return RandomSearchTuner(
                classification=self.classification,
                criterion=self.criterion,
                tuning=self.tuning,
                hpo=self.hpo,
                n_configs=self.n_configs,
                n_jobs=self.n_jobs,
                verbose=self.verbose,
                trainer=self.trainer,
                mlp_training=self.mlp_flag,
                threshold_tuning=self.threshold_tuning,
            )
        elif self.hpo == "hebo":
            return HEBOTuner(
                classification=self.classification,
                criterion=self.criterion,
                tuning=self.tuning,
                hpo=self.hpo,
                n_configs=self.n_configs,
                n_jobs=self.n_jobs,
                verbose=self.verbose,
                trainer=self.trainer,
                mlp_training=self.mlp_flag,
                threshold_tuning=self.threshold_tuning,
            )
        else:
            raise ValueError(f"Unsupported HPO method: {self.hpo}")

    def _train_final_model(
        self, final_model_tuple: Tuple[str, Dict, Optional[float]]
    ) -> dict:
        """Helper method to train the final model with best parameters.

        Args:
            final_model_tuple (Tuple[str, Dict, Optional[float]]): A tuple containing
                the learner name, best hyperparameters, and an optional best threshold.

        Returns:
            dict: A dictionary containing the trained model and its evaluation metrics.
        """
        return self.trainer.train_final_model(
            df=self.df,
            resampler=self.resampler,
            model=final_model_tuple,
            sampling=self.sampling,
            factor=self.factor,
            n_jobs=self.n_jobs,
            seed=self.test_seed,
            test_size=self.test_size,
            verbose=self.verbose,
        )

    @abstractmethod
    def perform_evaluation(self) -> dict:
        """Perform model evaluation and return final metrics."""

    @abstractmethod
    def _evaluate_holdout(self, train_df: pd.DataFrame) -> dict:
        """Perform holdout validation and return the final model metrics.

        Args:
            train_df (pd.DataFrame): train df for holdout tuning.
        """

    @abstractmethod
    def _evaluate_cv(self) -> dict:
        """Perform cross-validation and return the final model metrics."""

__init__(df, task, learner, criterion, encoding, tuning, hpo, sampling, factor, n_configs, racing_folds, n_jobs, cv_folds, test_seed, test_size, val_size, cv_seed, mlp_flag, threshold_tuning, verbose)

Initialize the Experiment class with tuning parameters.

Source code in periomod/benchmarking/_basebenchmark.py
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
def __init__(
    self,
    df: pd.DataFrame,
    task: str,
    learner: str,
    criterion: str,
    encoding: str,
    tuning: Optional[str],
    hpo: Optional[str],
    sampling: Optional[str],
    factor: Optional[float],
    n_configs: int,
    racing_folds: Optional[int],
    n_jobs: int,
    cv_folds: Optional[int],
    test_seed: int,
    test_size: float,
    val_size: Optional[float],
    cv_seed: Optional[int],
    mlp_flag: Optional[bool],
    threshold_tuning: Optional[bool],
    verbose: bool,
) -> None:
    """Initialize the Experiment class with tuning parameters."""
    self.task = task
    classification = self._determine_classification()
    super().__init__(
        classification=classification, criterion=criterion, tuning=tuning, hpo=hpo
    )
    self.df = df
    self.learner = learner
    self.encoding = encoding
    self.sampling = sampling
    self.factor = factor
    self.n_configs = n_configs
    self.racing_folds = racing_folds
    self.n_jobs = n_jobs
    self.cv_folds = cv_folds
    self.test_seed = test_seed
    self.test_size = test_size
    self.val_size = val_size
    self.cv_seed = cv_seed
    self.mlp_flag = mlp_flag
    self.threshold_tuning = threshold_tuning
    self.verbose = verbose
    self.resampler = Resampler(self.classification, self.encoding)
    self.trainer = Trainer(
        self.classification,
        self.criterion,
        tuning=self.tuning,
        hpo=self.hpo,
        mlp_training=self.mlp_flag,
        threshold_tuning=self.threshold_tuning,
    )
    self.tuner = self._initialize_tuner()

perform_evaluation() abstractmethod

Perform model evaluation and return final metrics.

Source code in periomod/benchmarking/_basebenchmark.py
235
236
237
@abstractmethod
def perform_evaluation(self) -> dict:
    """Perform model evaluation and return final metrics."""