Skip to content

HeboTuner

Bases: BaseTuner

HEBO (Bayesian Optimization) hyperparameter tuning class.

This class performs hyperparameter tuning for machine learning models using Bayesian Optimization with the HEBO library, supporting both holdout and cross-validation (CV) tuning methods.

Inherits
  • BaseTuner: Provides a framework for implementing HPO strategies, including shared evaluation and logging functions.

Parameters:

Name Type Description Default
classification str

The type of classification ('binary' or 'multiclass').

required
criterion str

The evaluation criterion (e.g., 'f1', 'brier_score').

required
tuning str

The type of tuning ('holdout' or 'cv').

required
hpo str

The hyperparameter optimization method (default is 'HEBO').

'hebo'
n_configs int

Number of configurations to evaluate. Defaults to 10.

10
n_jobs int

Number of parallel jobs for model training. Defaults to 1.

1
verbose bool

Whether to print detailed logs during HEBO optimization. Defaults to True.

True
trainer Optional[Trainer]

Trainer instance for model training.

None
mlp_training bool

Enables MLP-specific training with early stopping.

True
threshold_tuning bool

Enables threshold tuning for binary classification when the criterion is "f1".

True

Attributes:

Name Type Description
classification str

Specifies the classification type ('binary' or 'multiclass').

criterion str

The tuning criterion to optimize ('f1', 'brier_score' or 'macro_f1').

tuning str

Indicates the tuning approach ('holdout' or 'cv').

hpo str

Hyperparameter optimization method, default is 'HEBO'.

n_configs int

Number of configurations for HPO.

n_jobs int

Number of parallel jobs for model evaluation.

verbose bool

Enables logging during tuning if set to True.

mlp_training bool

Flag to enable MLP training with early stopping.

threshold_tuning bool

Enables threshold tuning for binary classification.

trainer Trainer

Trainer instance for managing model training and evaluation.

Methods:

Name Description
holdout

Optimizes hyperparameters using HEBO for holdout validation.

cv

Optimizes hyperparameters using HEBO with cross-validation.

Example
trainer = Trainer(
    classification="binary",
    criterion="f1",
    tuning="holdout",
    hpo="hebo",
    mlp_training=True,
    threshold_tuning=True,
)

tuner = HEBOTuner(
    classification="binary",
    criterion="f1",
    tuning="holdout",
    hpo="hebo",
    n_configs=10,
    n_jobs=-1,
    verbose=True,
    trainer=trainer,
    mlp_training=True,
    threshold_tuning=True,
)

best_params, best_threshold = tuner.holdout(
    learner="rf",
    X_train=X_train,
    y_train=y_train,
    X_val=X_val,
    y_val=y_val
)

# Using cross-validation
best_params, best_threshold = tuner.cv(
    learner="rf",
    outer_splits=cross_val_splits
)
Source code in periomod/tuning/_hebo.py
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
class HEBOTuner(BaseTuner):
    """HEBO (Bayesian Optimization) hyperparameter tuning class.

    This class performs hyperparameter tuning for machine learning models
    using Bayesian Optimization with the HEBO library, supporting both holdout
    and cross-validation (CV) tuning methods.

    Inherits:
        - `BaseTuner`: Provides a framework for implementing HPO strategies,
          including shared evaluation and logging functions.

    Args:
        classification (str): The type of classification ('binary' or 'multiclass').
        criterion (str): The evaluation criterion (e.g., 'f1', 'brier_score').
        tuning (str): The type of tuning ('holdout' or 'cv').
        hpo (str): The hyperparameter optimization method (default is 'HEBO').
        n_configs (int): Number of configurations to evaluate. Defaults to 10.
        n_jobs (int): Number of parallel jobs for model training.
            Defaults to 1.
        verbose (bool): Whether to print detailed logs during HEBO optimization.
            Defaults to True.
        trainer (Optional[Trainer]): Trainer instance for model training.
        mlp_training (bool): Enables MLP-specific training with early stopping.
        threshold_tuning (bool): Enables threshold tuning for binary classification
            when the criterion is "f1".

    Attributes:
        classification (str): Specifies the classification type
            ('binary' or 'multiclass').
        criterion (str): The tuning criterion to optimize
            ('f1', 'brier_score' or 'macro_f1').
        tuning (str): Indicates the tuning approach ('holdout' or 'cv').
        hpo (str): Hyperparameter optimization method, default is 'HEBO'.
        n_configs (int): Number of configurations for HPO.
        n_jobs (int): Number of parallel jobs for model evaluation.
        verbose (bool): Enables logging during tuning if set to True.
        mlp_training (bool): Flag to enable MLP training with early stopping.
        threshold_tuning (bool): Enables threshold tuning for binary classification.
        trainer (Trainer): Trainer instance for managing model training and evaluation.

    Methods:
        holdout: Optimizes hyperparameters using HEBO for holdout validation.
        cv: Optimizes hyperparameters using HEBO with cross-validation.

    Example:
        ```
        trainer = Trainer(
            classification="binary",
            criterion="f1",
            tuning="holdout",
            hpo="hebo",
            mlp_training=True,
            threshold_tuning=True,
        )

        tuner = HEBOTuner(
            classification="binary",
            criterion="f1",
            tuning="holdout",
            hpo="hebo",
            n_configs=10,
            n_jobs=-1,
            verbose=True,
            trainer=trainer,
            mlp_training=True,
            threshold_tuning=True,
        )

        best_params, best_threshold = tuner.holdout(
            learner="rf",
            X_train=X_train,
            y_train=y_train,
            X_val=X_val,
            y_val=y_val
        )

        # Using cross-validation
        best_params, best_threshold = tuner.cv(
            learner="rf",
            outer_splits=cross_val_splits
        )
        ```
    """

    def __init__(
        self,
        classification: str,
        criterion: str,
        tuning: str,
        hpo: str = "hebo",
        n_configs: int = 10,
        n_jobs: int = 1,
        verbose: bool = True,
        trainer: Optional[Trainer] = None,
        mlp_training: bool = True,
        threshold_tuning: bool = True,
    ) -> None:
        """Initialize HEBOTuner."""
        super().__init__(
            classification=classification,
            criterion=criterion,
            tuning=tuning,
            hpo=hpo,
            n_configs=n_configs,
            n_jobs=n_jobs,
            verbose=verbose,
            trainer=trainer,
            mlp_training=mlp_training,
            threshold_tuning=threshold_tuning,
        )

    def holdout(
        self,
        learner: str,
        X_train: pd.DataFrame,
        y_train: pd.Series,
        X_val: pd.DataFrame,
        y_val: pd.Series,
    ) -> Tuple[Dict[str, Union[float, int]], Optional[float]]:
        """Perform Bayesian Optimization using hebo for holdout validation.

        Args:
            learner (str): The machine learning model to evaluate.
            X_train (pd.DataFrame): The training features for the holdout set.
            y_train (pd.Series): The training labels for the holdout set.
            X_val (pd.DataFrame): The validation features for the holdout set.
            y_val (pd.Series): The validation labels for the holdout set.

        Returns:
            Tuple: The best hyperparameters and the best threshold.
        """
        return self._run_optimization(
            learner=learner,
            X_train=X_train,
            y_train=y_train,
            X_val=X_val,
            y_val=y_val,
            outer_splits=None,
        )

    def cv(
        self,
        learner: str,
        outer_splits: List[Tuple[pd.DataFrame, pd.DataFrame]],
        racing_folds: Optional[int] = None,
    ) -> Tuple[Dict[str, Union[float, int]], Optional[float]]:
        """Perform Bayesian Optimization using HEBO with cross-validation.

        Args:
            learner (str): The machine learning model to evaluate.
            outer_splits (List[Tuple[pd.DataFrame, pd.DataFrame]]):
                List of cross-validation folds.
            racing_folds (Optional[int]): Number of racing folds; if None, regular
                cross-validation is performed.

        Returns:
            Tuple: The best hyperparameters and the best threshold.
        """
        return self._run_optimization(
            learner=learner,
            X_train=None,
            y_train=None,
            X_val=None,
            y_val=None,
            outer_splits=outer_splits,
        )

    def _run_optimization(
        self,
        learner: str,
        X_train: Optional[pd.DataFrame],
        y_train: Optional[pd.Series],
        X_val: Optional[pd.DataFrame],
        y_val: Optional[pd.Series],
        outer_splits: Optional[List[Tuple[pd.DataFrame, pd.DataFrame]]],
    ) -> Tuple[Dict[str, Union[float, int]], Optional[float]]:
        """Perform Bayesian Optimization using HEBO for holdout and cross-validation.

        Args:
            learner (str): The machine learning model to evaluate.
            X_train (Optional[pd.DataFrame]): Training features for the holdout set
                (None if using CV).
            y_train (Optional[pd.Series]): Training labels for the holdout set
                (None if using CV).
            X_val (Optional[pd.DataFrame]): Validation features for the holdout set
                (None if using CV).
            y_val (Optional[pd.Series]): Validation labels for the holdout set
                (None if using CV).
            outer_splits (Optional[List[Tuple[pd.DataFrame, pd.DataFrame]]]):
                Cross-validation folds (None if using holdout).

        Returns:
            Tuple: The best hyperparameters and the best threshold.
        """
        model, search_space, params_func = Model.get(
            learner=learner, classification=self.classification, hpo=self.hpo
        )
        space = DesignSpace().parse(search_space)
        optimizer = HEBO(space)

        for i in range(self.n_configs):
            params_suggestion = optimizer.suggest(n_suggestions=1).iloc[0]
            params_dict = params_func(params_suggestion)

            score = self._objective(
                model=model,
                params_dict=params_dict,
                X_train=X_train,
                y_train=y_train,
                X_val=X_val,
                y_val=y_val,
                outer_splits=outer_splits,
            )
            optimizer.observe(pd.DataFrame([params_suggestion]), np.array([score]))

            if self.verbose:
                self._print_iteration_info(
                    iteration=i, model=model, params_dict=params_dict, score=score
                )

        best_params_idx = optimizer.y.argmin()
        best_params_df = optimizer.X.iloc[best_params_idx]
        best_params = params_func(best_params_df)
        best_threshold = None
        if self.classification == "binary" and self.threshold_tuning:
            model_clone = clone(model).set_params(**best_params)
            if self.criterion == "f1":
                if self.tuning == "holdout":
                    model_clone.fit(X_train, y_train)
                    probs = model_clone.predict_proba(X_val)[:, 1]
                    _, best_threshold = self.trainer.evaluate(
                        y_val, probs, self.threshold_tuning
                    )

                elif self.tuning == "cv":
                    best_threshold = self.trainer.optimize_threshold(
                        model=model_clone, outer_splits=outer_splits, n_jobs=self.n_jobs
                    )

        return best_params, best_threshold

    def _objective(
        self,
        model: Any,
        params_dict: Dict[str, Union[float, int]],
        X_train: Optional[pd.DataFrame],
        y_train: Optional[pd.Series],
        X_val: Optional[pd.DataFrame],
        y_val: Optional[pd.Series],
        outer_splits: Optional[List[Tuple[pd.DataFrame, pd.DataFrame]]],
    ) -> float:
        """Evaluate the model performance for both holdout and cross-validation.

        Args:
            model (Any): The machine learning model to evaluate.
            params_dict (Dict[str, Union[float, int]]): The suggested hyperparameters
                as a dictionary.
            X_train (Optional[pd.DataFrame]): Training features for the holdout set
                (None for CV).
            y_train (Optional[pd.Series]): Training labels for the holdout set
                (None for CV).
            X_val (Optional[pd.DataFrame]): Validation features for the holdout set
                (None for CV).
            y_val (Optional[pd.Series]): Validation labels for the holdout set
                (None for CV).
            outer_splits (Optional[List[Tuple[pd.DataFrame, pd.DataFrame]]]):
                Cross-validation folds (None for holdout).

        Returns:
            float: The evaluation score to be minimized by HEBO.
        """
        model_clone = clone(model)
        model_clone.set_params(**params_dict)

        if "n_jobs" in model_clone.get_params():
            model_clone.set_params(n_jobs=self.n_jobs)

        score = self._evaluate_objective(
            model=model_clone,
            X_train=X_train,
            y_train=y_train,
            X_val=X_val,
            y_val=y_val,
            outer_splits=outer_splits,
        )

        return -score if self.criterion in ["f1", "macro_f1"] else score

    def _evaluate_objective(
        self,
        model: Any,
        X_train: pd.DataFrame,
        y_train: pd.Series,
        X_val: pd.DataFrame,
        y_val: pd.Series,
        outer_splits: Optional[List[Tuple[pd.DataFrame, pd.Series]]],
    ) -> float:
        """Evaluates the model's performance based on the tuning strategy.

        The tuning strategy can be either 'holdout' or 'cv' (cross-validation).

        Args:
            model (Any): The cloned machine learning model to be
                evaluated.
            X_train (pd.DataFrame): Training features for the holdout set.
            y_train (pd.Series): Training labels for the holdout set.
            X_val (pd.DataFrame): Validation features for the holdout set (used for
                'holdout' tuning).
            y_val (pd.Series): Validation labels for the holdout set (used for
                'holdout' tuning).
            outer_splits (List[tuple]): List of cross-validation folds, each a tuple
                containing (X_train_fold, y_train_fold).

        Returns:
            float: The model's performance metric based on tuning strategy.
        """
        if self.tuning == "holdout":
            score, _, _ = self.trainer.train(
                model=model, X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val
            )
            return score

        elif self.tuning == "cv":
            if outer_splits is None:
                raise ValueError(
                    "outer_splits cannot be None when using cross-validation."
                )
            scores = Parallel(n_jobs=self.n_jobs)(
                delayed(self.trainer.evaluate_cv)(deepcopy(model), fold)
                for fold in outer_splits
            )
            return np.mean(scores)

        raise ValueError(f"Unsupported criterion: {self.tuning}")

__init__(classification, criterion, tuning, hpo='hebo', n_configs=10, n_jobs=1, verbose=True, trainer=None, mlp_training=True, threshold_tuning=True)

Initialize HEBOTuner.

Source code in periomod/tuning/_hebo.py
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
def __init__(
    self,
    classification: str,
    criterion: str,
    tuning: str,
    hpo: str = "hebo",
    n_configs: int = 10,
    n_jobs: int = 1,
    verbose: bool = True,
    trainer: Optional[Trainer] = None,
    mlp_training: bool = True,
    threshold_tuning: bool = True,
) -> None:
    """Initialize HEBOTuner."""
    super().__init__(
        classification=classification,
        criterion=criterion,
        tuning=tuning,
        hpo=hpo,
        n_configs=n_configs,
        n_jobs=n_jobs,
        verbose=verbose,
        trainer=trainer,
        mlp_training=mlp_training,
        threshold_tuning=threshold_tuning,
    )

cv(learner, outer_splits, racing_folds=None)

Perform Bayesian Optimization using HEBO with cross-validation.

Parameters:

Name Type Description Default
learner str

The machine learning model to evaluate.

required
outer_splits List[Tuple[DataFrame, DataFrame]]

List of cross-validation folds.

required
racing_folds Optional[int]

Number of racing folds; if None, regular cross-validation is performed.

None

Returns:

Name Type Description
Tuple Tuple[Dict[str, Union[float, int]], Optional[float]]

The best hyperparameters and the best threshold.

Source code in periomod/tuning/_hebo.py
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
def cv(
    self,
    learner: str,
    outer_splits: List[Tuple[pd.DataFrame, pd.DataFrame]],
    racing_folds: Optional[int] = None,
) -> Tuple[Dict[str, Union[float, int]], Optional[float]]:
    """Perform Bayesian Optimization using HEBO with cross-validation.

    Args:
        learner (str): The machine learning model to evaluate.
        outer_splits (List[Tuple[pd.DataFrame, pd.DataFrame]]):
            List of cross-validation folds.
        racing_folds (Optional[int]): Number of racing folds; if None, regular
            cross-validation is performed.

    Returns:
        Tuple: The best hyperparameters and the best threshold.
    """
    return self._run_optimization(
        learner=learner,
        X_train=None,
        y_train=None,
        X_val=None,
        y_val=None,
        outer_splits=outer_splits,
    )

holdout(learner, X_train, y_train, X_val, y_val)

Perform Bayesian Optimization using hebo for holdout validation.

Parameters:

Name Type Description Default
learner str

The machine learning model to evaluate.

required
X_train DataFrame

The training features for the holdout set.

required
y_train Series

The training labels for the holdout set.

required
X_val DataFrame

The validation features for the holdout set.

required
y_val Series

The validation labels for the holdout set.

required

Returns:

Name Type Description
Tuple Tuple[Dict[str, Union[float, int]], Optional[float]]

The best hyperparameters and the best threshold.

Source code in periomod/tuning/_hebo.py
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
def holdout(
    self,
    learner: str,
    X_train: pd.DataFrame,
    y_train: pd.Series,
    X_val: pd.DataFrame,
    y_val: pd.Series,
) -> Tuple[Dict[str, Union[float, int]], Optional[float]]:
    """Perform Bayesian Optimization using hebo for holdout validation.

    Args:
        learner (str): The machine learning model to evaluate.
        X_train (pd.DataFrame): The training features for the holdout set.
        y_train (pd.Series): The training labels for the holdout set.
        X_val (pd.DataFrame): The validation features for the holdout set.
        y_val (pd.Series): The validation labels for the holdout set.

    Returns:
        Tuple: The best hyperparameters and the best threshold.
    """
    return self._run_optimization(
        learner=learner,
        X_train=X_train,
        y_train=y_train,
        X_val=X_val,
        y_val=y_val,
        outer_splits=None,
    )