Skip to content

RandomSearchTuner

Bases: BaseTuner

Random Search hyperparameter tuning class.

This class performs hyperparameter tuning using random search, supporting both holdout and cross-validation (CV) tuning methods.

Inherits
  • BaseTuner: Provides a framework for implementing HPO strategies, including shared evaluation and logging functions.

Parameters:

Name Type Description Default
classification str

The type of classification ('binary' or 'multiclass').

required
criterion str

The evaluation criterion (e.g., 'f1', 'brier_score').

required
tuning str

The type of tuning ('holdout' or 'cv').

required
hpo str

The hyperparameter optimization method, default is 'rs'.

'rs'
n_configs int

Number of configurations to evaluate. Defaults to 10.

10
n_jobs int

Number of parallel jobs for model training. Defaults to 1.

1
verbose bool

Whether to print detailed logs during optimization. Defaults to True.

True
trainer Optional[Trainer]

Trainer instance for model training.

None
mlp_training bool

Enables MLP-specific training with early stopping.

True
threshold_tuning bool

Enables threshold tuning for binary classification when the criterion is "f1".

True

Attributes:

Name Type Description
classification str

Type of classification ('binary' or 'multiclass').

criterion str

Performance criterion for optimization ('f1', 'brier_score' or 'macro_f1').

tuning str

Tuning approach ('holdout' or 'cv').

hpo str

Hyperparameter optimization method (default is 'rs').

n_configs int

Number of configurations to evaluate.

n_jobs int

Number of parallel jobs for training.

verbose bool

Flag to enable detailed logs during optimization.

mlp_training bool

Enables MLP training with early stopping.

threshold_tuning bool

Enables threshold tuning if criterion is 'f1'.

trainer Trainer

Trainer instance for model evaluation.

Methods:

Name Description
holdout

Optimizes hyperparameters using random search for holdout validation.

cv

Optimizes hyperparameters using random search with cross-validation.

Example
trainer = Trainer(
    classification="binary",
    criterion="f1",
    tuning="cv",
    hpo="rs",
    mlp_training=True,
    threshold_tuning=True,
)

tuner = RandomSearchTuner(
    classification="binary",
    criterion="f1",
    tuning="cv",
    hpo="rs",
    n_configs=15,
    n_jobs=4,
    verbose=True,
    trainer=trainer,
    mlp_training=True,
    threshold_tuning=True,
)

# Running holdout-based tuning
best_params, best_threshold = tuner.holdout(
    learner="rf",
    X_train=X_train,
    y_train=y_train,
    X_val=X_val,
    y_val=y_val
)

# Running cross-validation tuning
best_params, best_threshold = tuner.cv(
    learner="rf",
    outer_splits=cross_val_splits
)
Source code in periomod/tuning/_randomsearch.py
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
class RandomSearchTuner(BaseTuner):
    """Random Search hyperparameter tuning class.

    This class performs hyperparameter tuning using random search, supporting
    both holdout and cross-validation (CV) tuning methods.

    Inherits:
        - `BaseTuner`: Provides a framework for implementing HPO strategies,
          including shared evaluation and logging functions.

    Args:
        classification (str): The type of classification ('binary' or 'multiclass').
        criterion (str): The evaluation criterion (e.g., 'f1', 'brier_score').
        tuning (str): The type of tuning ('holdout' or 'cv').
        hpo (str): The hyperparameter optimization method, default is 'rs'.
        n_configs (int): Number of configurations to evaluate. Defaults to 10.
        n_jobs (int): Number of parallel jobs for model training.
            Defaults to 1.
        verbose (bool): Whether to print detailed logs during optimization.
            Defaults to True.
        trainer (Optional[Trainer]): Trainer instance for model training.
        mlp_training (bool): Enables MLP-specific training with early stopping.
        threshold_tuning (bool): Enables threshold tuning for binary classification
            when the criterion is "f1".

    Attributes:
        classification (str): Type of classification ('binary' or 'multiclass').
        criterion (str): Performance criterion for optimization
            ('f1', 'brier_score' or 'macro_f1').
        tuning (str): Tuning approach ('holdout' or 'cv').
        hpo (str): Hyperparameter optimization method (default is 'rs').
        n_configs (int): Number of configurations to evaluate.
        n_jobs (int): Number of parallel jobs for training.
        verbose (bool): Flag to enable detailed logs during optimization.
        mlp_training (bool): Enables MLP training with early stopping.
        threshold_tuning (bool): Enables threshold tuning if criterion is 'f1'.
        trainer (Trainer): Trainer instance for model evaluation.

    Methods:
        holdout: Optimizes hyperparameters using random search for
            holdout validation.
        cv: Optimizes hyperparameters using random search with
            cross-validation.

    Example:
        ```
        trainer = Trainer(
            classification="binary",
            criterion="f1",
            tuning="cv",
            hpo="rs",
            mlp_training=True,
            threshold_tuning=True,
        )

        tuner = RandomSearchTuner(
            classification="binary",
            criterion="f1",
            tuning="cv",
            hpo="rs",
            n_configs=15,
            n_jobs=4,
            verbose=True,
            trainer=trainer,
            mlp_training=True,
            threshold_tuning=True,
        )

        # Running holdout-based tuning
        best_params, best_threshold = tuner.holdout(
            learner="rf",
            X_train=X_train,
            y_train=y_train,
            X_val=X_val,
            y_val=y_val
        )

        # Running cross-validation tuning
        best_params, best_threshold = tuner.cv(
            learner="rf",
            outer_splits=cross_val_splits
        )
        ```
    """

    def __init__(
        self,
        classification: str,
        criterion: str,
        tuning: str,
        hpo: str = "rs",
        n_configs: int = 10,
        n_jobs: int = 1,
        verbose: bool = True,
        trainer: Optional[Trainer] = None,
        mlp_training: bool = True,
        threshold_tuning: bool = True,
    ) -> None:
        """Initialize RandomSearchTuner."""
        super().__init__(
            classification=classification,
            criterion=criterion,
            tuning=tuning,
            hpo=hpo,
            n_configs=n_configs,
            n_jobs=n_jobs,
            verbose=verbose,
            trainer=trainer,
            mlp_training=mlp_training,
            threshold_tuning=threshold_tuning,
        )

    def holdout(
        self,
        learner: str,
        X_train: pd.DataFrame,
        y_train: pd.Series,
        X_val: pd.DataFrame,
        y_val: pd.Series,
    ) -> Tuple[Dict[str, Union[float, int]], Union[float, None]]:
        """Perform random search on the holdout set for binary and multiclass .

        Args:
            learner (str): The machine learning model used for evaluation.
            X_train (pd.DataFrame): Training features for the holdout set.
            y_train (pd.Series): Training labels for the holdout set.
            X_val (pd.DataFrame): Validation features for the holdout set.
            y_val (pd.Series): Validation labels for the holdout set.

        Returns:
            tuple:
                - Best score (float)
                - Best hyperparameters (dict)
                - Best threshold (float or None, applicable for binary classification).
        """
        (
            best_score,
            best_threshold,
            best_params,
            param_grid,
            model,
        ) = self._initialize_search(learner=learner, random_state=self.rs_state)

        for i in range(self.n_configs):
            params = self._sample_params(
                param_grid=param_grid, iteration=i, random_state=self.rs_state
            )
            model_clone = clone(model).set_params(**params)
            if "n_jobs" in model_clone.get_params():
                model_clone.set_params(n_jobs=self.n_jobs)

            score, model_clone, threshold = self.trainer.train(
                model_clone, X_train, y_train, X_val, y_val
            )

            best_score, best_params, best_threshold = self._update_best(
                current_score=score,
                params=params,
                threshold=threshold,
                best_score=best_score,
                best_params=best_params,
                best_threshold=best_threshold,
            )

            if self.verbose:
                self._print_iteration_info(
                    iteration=i,
                    model=model_clone,
                    params_dict=params,
                    score=score,
                    threshold=best_threshold,
                )

        return best_params, best_threshold

    def cv(
        self,
        learner: str,
        outer_splits: List[Tuple[pd.DataFrame, pd.DataFrame]],
        racing_folds: Union[int, None],
    ) -> Tuple[dict, Union[float, None]]:
        """Perform cross-validation with optional racing and hyperparameter tuning.

        Args:
            learner: The machine learning model to evaluate.
            outer_splits: List of training and validation splits.
            racing_folds (int or None): Number of folds for racing; None uses all folds.

        Returns:
            tuple: Best hyperparameters, and optimal threshold (if applicable).
        """
        best_score, _, best_params, param_grid, model = self._initialize_search(
            learner=learner, random_state=self.rs_state
        )

        for i in range(self.n_configs):
            params = self._sample_params(param_grid, random_state=self.rs_state)
            model_clone = clone(model).set_params(**params)
            if "n_jobs" in model_clone.get_params():
                model_clone.set_params(n_jobs=self.n_jobs)
            scores = self._evaluate_folds(
                model=model_clone,
                best_score=best_score,
                outer_splits=outer_splits,
                racing_folds=racing_folds,
            )
            avg_score = np.mean(scores)
            best_score, best_params, _ = self._update_best(
                current_score=avg_score,
                params=params,
                threshold=None,
                best_score=best_score,
                best_params=best_params,
                best_threshold=None,
            )

            if self.verbose:
                self._print_iteration_info(
                    iteration=i, model=model_clone, params_dict=params, score=avg_score
                )

        if (
            self.classification == "binary"
            and self.criterion == "f1"
            and self.threshold_tuning
        ):
            optimal_threshold = self.trainer.optimize_threshold(
                model=model_clone, outer_splits=outer_splits, n_jobs=self.n_jobs
            )
        else:
            optimal_threshold = None

        return best_params, optimal_threshold

    def _evaluate_folds(
        self,
        model: Any,
        best_score: float,
        outer_splits: List[Tuple[pd.DataFrame, pd.DataFrame]],
        racing_folds: Union[int, None],
    ) -> list:
        """Evaluate the model across folds using cross-validation or racing strategy.

        Args:
            model (Any): The cloned model to evaluate.
            best_score (float): The best score recorded so far.
            outer_splits (list of tuples): List of training/validation folds.
            racing_folds (int or None): Number of folds to use for the racing strategy.

        Returns:
            scores: Scores from each fold evaluation.
        """
        num_folds = len(outer_splits)
        if racing_folds is None or racing_folds >= num_folds:
            scores = Parallel(n_jobs=self.n_jobs)(
                delayed(self.trainer.evaluate_cv)(deepcopy(model), fold)
                for fold in outer_splits
            )
        else:
            selected_indices = random.sample(range(num_folds), racing_folds)
            selected_folds = [outer_splits[i] for i in selected_indices]
            initial_scores = Parallel(n_jobs=self.n_jobs)(
                delayed(self.trainer.evaluate_cv)(deepcopy(model), fold)
                for fold in selected_folds
            )
            avg_initial_score = np.mean(initial_scores)

            if (
                self.criterion in ["f1", "macro_f1"] and avg_initial_score > best_score
            ) or (self.criterion == "brier_score" and avg_initial_score < best_score):
                remaining_folds = [
                    outer_splits[i]
                    for i in range(num_folds)
                    if i not in selected_indices
                ]
                continued_scores = Parallel(n_jobs=self.n_jobs)(
                    delayed(self.trainer.evaluate_cv)(deepcopy(model), fold)
                    for fold in remaining_folds
                )
                scores = initial_scores + continued_scores
            else:
                scores = initial_scores

        return scores

    def _initialize_search(
        self, learner: str, random_state: int
    ) -> Tuple[float, Union[float, None], Dict[str, Union[float, int]], dict, object]:
        """Initialize search with random seed, best score, parameters, and model.

        Args:
            learner (str): The learner type to be used for training the model.
            random_state (int): Random state.

        Returns:
            Tuple:
                - best_score: Initialized best score based on the criterion.
                - best_threshold: None or a placeholder threshold for binary
                    classification.
                - param_grid: The parameter grid for the specified model.
                - model: The model instance.
        """
        random.seed(random_state)
        best_score = (
            -float("inf") if self.criterion in ["f1", "macro_f1"] else float("inf")
        )
        best_threshold = None
        best_params: Dict[str, Union[float, int]] = {}
        model, param_grid = Model.get(
            learner=learner, classification=self.classification, hpo=self.hpo
        )

        return best_score, best_threshold, best_params, param_grid, model

    def _update_best(
        self,
        current_score: float,
        params: dict,
        threshold: Union[float, None],
        best_score: float,
        best_params: dict,
        best_threshold: Union[float, None],
    ) -> Tuple[float, dict, Union[float, None]]:
        """Update best score, parameters, and threshold if current score is better.

        Args:
            current_score (float): The current score obtained.
            params (dict): The parameters associated with the current score.
            threshold (float or None): The threshold associated with the current score.
            best_score (float): The best score recorded so far.
            best_params (dict): The best parameters recorded so far.
            best_threshold (float or None): The best threshold recorded so far.

        Returns:
            tuple: Updated best score, best parameters, and best threshold (optional).
        """
        if (self.criterion in ["f1", "macro_f1"] and current_score > best_score) or (
            self.criterion == "brier_score" and current_score < best_score
        ):
            best_score = current_score
            best_params = params
            best_threshold = threshold if self.classification == "binary" else None

        return best_score, best_params, best_threshold

    def _sample_params(
        self,
        param_grid: Dict[str, Union[list, object]],
        iteration: Optional[int] = None,
        random_state: Optional[int] = None,
    ) -> Dict[str, Union[float, int]]:
        """Sample a set of hyperparameters from the provided grid.

        Args:
            param_grid (dict): Hyperparameter grid.
            iteration (Optional[int]): Current iteration index for random seed
                adjustment. If None, the iteration seed will not be adjusted.
            random_state (Optional[int]): Random state

        Returns:
            dict: Sampled hyperparameters.
        """
        iteration_seed = (
            random_state + iteration
            if random_state is not None and iteration is not None
            else None
        )

        params = {}
        for k, v in param_grid.items():
            if hasattr(v, "rvs"):
                params[k] = v.rvs(random_state=iteration_seed)
            elif isinstance(v, list):
                if iteration_seed is not None:
                    random.seed(iteration_seed)
                params[k] = random.choice(v)
            elif isinstance(v, np.ndarray):
                if iteration_seed is not None:
                    random.seed(iteration_seed)
                params[k] = random.choice(v.tolist())
            else:
                raise TypeError(f"Unsupported type for parameter '{k}': {type(v)}")

        return params

__init__(classification, criterion, tuning, hpo='rs', n_configs=10, n_jobs=1, verbose=True, trainer=None, mlp_training=True, threshold_tuning=True)

Initialize RandomSearchTuner.

Source code in periomod/tuning/_randomsearch.py
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
def __init__(
    self,
    classification: str,
    criterion: str,
    tuning: str,
    hpo: str = "rs",
    n_configs: int = 10,
    n_jobs: int = 1,
    verbose: bool = True,
    trainer: Optional[Trainer] = None,
    mlp_training: bool = True,
    threshold_tuning: bool = True,
) -> None:
    """Initialize RandomSearchTuner."""
    super().__init__(
        classification=classification,
        criterion=criterion,
        tuning=tuning,
        hpo=hpo,
        n_configs=n_configs,
        n_jobs=n_jobs,
        verbose=verbose,
        trainer=trainer,
        mlp_training=mlp_training,
        threshold_tuning=threshold_tuning,
    )

cv(learner, outer_splits, racing_folds)

Perform cross-validation with optional racing and hyperparameter tuning.

Parameters:

Name Type Description Default
learner str

The machine learning model to evaluate.

required
outer_splits List[Tuple[DataFrame, DataFrame]]

List of training and validation splits.

required
racing_folds int or None

Number of folds for racing; None uses all folds.

required

Returns:

Name Type Description
tuple Tuple[dict, Union[float, None]]

Best hyperparameters, and optimal threshold (if applicable).

Source code in periomod/tuning/_randomsearch.py
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
def cv(
    self,
    learner: str,
    outer_splits: List[Tuple[pd.DataFrame, pd.DataFrame]],
    racing_folds: Union[int, None],
) -> Tuple[dict, Union[float, None]]:
    """Perform cross-validation with optional racing and hyperparameter tuning.

    Args:
        learner: The machine learning model to evaluate.
        outer_splits: List of training and validation splits.
        racing_folds (int or None): Number of folds for racing; None uses all folds.

    Returns:
        tuple: Best hyperparameters, and optimal threshold (if applicable).
    """
    best_score, _, best_params, param_grid, model = self._initialize_search(
        learner=learner, random_state=self.rs_state
    )

    for i in range(self.n_configs):
        params = self._sample_params(param_grid, random_state=self.rs_state)
        model_clone = clone(model).set_params(**params)
        if "n_jobs" in model_clone.get_params():
            model_clone.set_params(n_jobs=self.n_jobs)
        scores = self._evaluate_folds(
            model=model_clone,
            best_score=best_score,
            outer_splits=outer_splits,
            racing_folds=racing_folds,
        )
        avg_score = np.mean(scores)
        best_score, best_params, _ = self._update_best(
            current_score=avg_score,
            params=params,
            threshold=None,
            best_score=best_score,
            best_params=best_params,
            best_threshold=None,
        )

        if self.verbose:
            self._print_iteration_info(
                iteration=i, model=model_clone, params_dict=params, score=avg_score
            )

    if (
        self.classification == "binary"
        and self.criterion == "f1"
        and self.threshold_tuning
    ):
        optimal_threshold = self.trainer.optimize_threshold(
            model=model_clone, outer_splits=outer_splits, n_jobs=self.n_jobs
        )
    else:
        optimal_threshold = None

    return best_params, optimal_threshold

holdout(learner, X_train, y_train, X_val, y_val)

Perform random search on the holdout set for binary and multiclass .

Parameters:

Name Type Description Default
learner str

The machine learning model used for evaluation.

required
X_train DataFrame

Training features for the holdout set.

required
y_train Series

Training labels for the holdout set.

required
X_val DataFrame

Validation features for the holdout set.

required
y_val Series

Validation labels for the holdout set.

required

Returns:

Name Type Description
tuple Tuple[Dict[str, Union[float, int]], Union[float, None]]
  • Best score (float)
  • Best hyperparameters (dict)
  • Best threshold (float or None, applicable for binary classification).
Source code in periomod/tuning/_randomsearch.py
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
def holdout(
    self,
    learner: str,
    X_train: pd.DataFrame,
    y_train: pd.Series,
    X_val: pd.DataFrame,
    y_val: pd.Series,
) -> Tuple[Dict[str, Union[float, int]], Union[float, None]]:
    """Perform random search on the holdout set for binary and multiclass .

    Args:
        learner (str): The machine learning model used for evaluation.
        X_train (pd.DataFrame): Training features for the holdout set.
        y_train (pd.Series): Training labels for the holdout set.
        X_val (pd.DataFrame): Validation features for the holdout set.
        y_val (pd.Series): Validation labels for the holdout set.

    Returns:
        tuple:
            - Best score (float)
            - Best hyperparameters (dict)
            - Best threshold (float or None, applicable for binary classification).
    """
    (
        best_score,
        best_threshold,
        best_params,
        param_grid,
        model,
    ) = self._initialize_search(learner=learner, random_state=self.rs_state)

    for i in range(self.n_configs):
        params = self._sample_params(
            param_grid=param_grid, iteration=i, random_state=self.rs_state
        )
        model_clone = clone(model).set_params(**params)
        if "n_jobs" in model_clone.get_params():
            model_clone.set_params(n_jobs=self.n_jobs)

        score, model_clone, threshold = self.trainer.train(
            model_clone, X_train, y_train, X_val, y_val
        )

        best_score, best_params, best_threshold = self._update_best(
            current_score=score,
            params=params,
            threshold=threshold,
            best_score=best_score,
            best_params=best_params,
            best_threshold=best_threshold,
        )

        if self.verbose:
            self._print_iteration_info(
                iteration=i,
                model=model_clone,
                params_dict=params,
                score=score,
                threshold=best_threshold,
            )

    return best_params, best_threshold