Skip to content

Experiment

Bases: BaseExperiment

Concrete implementation for performing ML experiments and evaluation.

This class extends BaseExperiment, providing methods for evaluating machine learning models using holdout or cross-validation strategies. It performs hyperparameter tuning, final model training, and evaluation based on specified tuning and optimization methods.

Inherits

BaseExperiment: Provides core functionality for validation, resampling, training, and tuning configurations.

Parameters:

Name Type Description Default
df DataFrame

The preloaded data for the experiment.

required
task str

The task name used to determine classification type. Can be 'pocketclosure', 'pocketclosureinf', 'improvement', or 'pdgrouprevaluation'.

required
learner str

Specifies the model or algorithm to evaluate. Includes 'xgb', 'rf', 'lr' or 'mlp'.

required
criterion str

Criterion for optimization ('f1', 'macro_f1' or 'brier_score').

required
encoding str

Encoding type for categorical features ('one_hot' or 'binary').

required
tuning Optional[str]

Tuning method to apply ('holdout' or 'cv'). Can be None.

required
hpo Optional[str]

Hyperparameter optimization method ('rs' or 'hebo'). Can be None.

required
sampling Optional[str]

Resampling strategy to apply. Defaults to None. Includes None, 'upsampling', 'downsampling', and 'smote'.

None
factor Optional[float]

Resampling factor. Defaults to None.

None
n_configs int

Number of configurations for hyperparameter tuning. Defaults to 10.

10
racing_folds Optional[int]

Number of racing folds for Random Search (RS). Defaults to None.

None
n_jobs int

Number of parallel jobs to run for evaluation. Defaults to 1.

1
cv_folds Optional[int]

Number of folds for cross-validation; Defaults to 10.

10
test_seed int

Random seed for test splitting. Defaults to 0.

0
test_size float

Proportion of data used for testing. Defaults to 0.2.

0.2
val_size Optional[float]

Size of validation set in holdout tuning. Defaults to 0.2.

0.2
cv_seed Optional[int]

Random seed for cross-validation. Defaults to 0

0
mlp_flag Optional[bool]

Flag to enable MLP training with early stopping. Defaults to None.

None
threshold_tuning Optional[bool]

If True, performs threshold tuning for binary classification if the criterion is "f1". Defaults to None.

None
verbose bool

Enables verbose output if set to True.

True

Attributes:

Name Type Description
df DataFrame

Dataset used for training and evaluation.

task str

Name of the task used to determine the classification type.

learner str

Model or algorithm name for the experiment.

criterion str

Criterion for performance evaluation.

encoding str

Encoding type for categorical features.

sampling str

Resampling method used in training.

factor float

Factor applied during resampling.

n_configs int

Number of configurations evaluated in hyperparameter tuning.

racing_folds int

Number of racing folds for random search.

n_jobs int

Number of parallel jobs used during processing.

cv_folds int

Number of cross-validation folds.

test_seed int

Seed for reproducible test splitting.

test_size float

Proportion of data reserved for testing.

val_size float

Size of the validation set in holdout tuning.

cv_seed int

Seed for reproducible cross-validation splits.

mlp_flag bool

Indicates if MLP training with early stopping is enabled.

threshold_tuning bool

Enables threshold tuning for binary classification.

verbose bool

Controls detailed output during the experiment.

resampler Resampler

Resampler instance for data handling.

trainer Trainer

Trainer instance for model training and evaluation.

tuner Tuner

Initialized tuner for hyperparameter optimization.

Methods:

Name Description
perform_evaluation

Conducts evaluation based on the tuning method.

Example
from periomod.benchmarking import Experiment
from periomod.data import ProcessedDataLoader

# Load a dataframe with the correct target and encoding selected
dataloader = ProcessedDataLoader(task="pocketclosure", encoding="one_hot")
df = dataloader.load_data(path="data/processed/processed_data.csv")
df = dataloader.transform_data(df=df)

experiment = Experiment(
    df=df,
    task="pocketclosure",
    learner="rf",
    criterion="f1",
    encoding="one_hot",
    tuning="cv",
    hpo="rs",
    sampling="upsample",
    factor=1.5,
    n_configs=20,
    racing_folds=5,
)

# Perform the evaluation based on cross-validation
final_metrics = experiment.perform_evaluation()
print(final_metrics)
Source code in periomod/benchmarking/_benchmark.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
class Experiment(BaseExperiment):
    """Concrete implementation for performing ML experiments and evaluation.

    This class extends `BaseExperiment`, providing methods for evaluating machine
    learning models using holdout or cross-validation strategies. It performs
    hyperparameter tuning, final model training, and evaluation based on
    specified tuning and optimization methods.

    Inherits:
        `BaseExperiment`: Provides core functionality for validation, resampling,
            training, and tuning configurations.

    Args:
        df (pd.DataFrame): The preloaded data for the experiment.
        task (str): The task name used to determine classification type.
            Can be 'pocketclosure', 'pocketclosureinf', 'improvement', or
            'pdgrouprevaluation'.
        learner (str): Specifies the model or algorithm to evaluate.
            Includes 'xgb', 'rf', 'lr' or 'mlp'.
        criterion (str): Criterion for optimization ('f1', 'macro_f1' or 'brier_score').
        encoding (str): Encoding type for categorical features ('one_hot' or 'binary').
        tuning (Optional[str]): Tuning method to apply ('holdout' or 'cv'). Can be None.
        hpo (Optional[str]): Hyperparameter optimization method ('rs' or 'hebo').
            Can be None.
        sampling (Optional[str]): Resampling strategy to apply. Defaults to None.
            Includes None, 'upsampling', 'downsampling', and 'smote'.
        factor (Optional[float]): Resampling factor. Defaults to None.
        n_configs (int): Number of configurations for hyperparameter tuning.
            Defaults to 10.
        racing_folds (Optional[int]): Number of racing folds for Random Search (RS).
            Defaults to None.
        n_jobs (int): Number of parallel jobs to run for evaluation.
            Defaults to 1.
        cv_folds (Optional[int]): Number of folds for cross-validation;
            Defaults to 10.
        test_seed (int): Random seed for test splitting. Defaults to 0.
        test_size (float): Proportion of data used for testing. Defaults to
            0.2.
        val_size (Optional[float]): Size of validation set in holdout tuning.
            Defaults to 0.2.
        cv_seed (Optional[int]): Random seed for cross-validation. Defaults to 0
        mlp_flag (Optional[bool]): Flag to enable MLP training with early stopping.
            Defaults to None.
        threshold_tuning (Optional[bool]): If True, performs threshold tuning for binary
            classification if the criterion is "f1". Defaults to None.
        verbose (bool): Enables verbose output if set to True.

    Attributes:
        df (pd.DataFrame): Dataset used for training and evaluation.
        task (str): Name of the task used to determine the classification type.
        learner (str): Model or algorithm name for the experiment.
        criterion (str): Criterion for performance evaluation.
        encoding (str): Encoding type for categorical features.
        sampling (str): Resampling method used in training.
        factor (float): Factor applied during resampling.
        n_configs (int): Number of configurations evaluated in hyperparameter tuning.
        racing_folds (int): Number of racing folds for random search.
        n_jobs (int): Number of parallel jobs used during processing.
        cv_folds (int): Number of cross-validation folds.
        test_seed (int): Seed for reproducible test splitting.
        test_size (float): Proportion of data reserved for testing.
        val_size (float): Size of the validation set in holdout tuning.
        cv_seed (int): Seed for reproducible cross-validation splits.
        mlp_flag (bool): Indicates if MLP training with early stopping is enabled.
        threshold_tuning (bool): Enables threshold tuning for binary classification.
        verbose (bool): Controls detailed output during the experiment.
        resampler (Resampler): Resampler instance for data handling.
        trainer (Trainer): Trainer instance for model training and evaluation.
        tuner (Tuner): Initialized tuner for hyperparameter optimization.

    Methods:
        perform_evaluation: Conducts evaluation based on the tuning method.

    Example:
        ```
        from periomod.benchmarking import Experiment
        from periomod.data import ProcessedDataLoader

        # Load a dataframe with the correct target and encoding selected
        dataloader = ProcessedDataLoader(task="pocketclosure", encoding="one_hot")
        df = dataloader.load_data(path="data/processed/processed_data.csv")
        df = dataloader.transform_data(df=df)

        experiment = Experiment(
            df=df,
            task="pocketclosure",
            learner="rf",
            criterion="f1",
            encoding="one_hot",
            tuning="cv",
            hpo="rs",
            sampling="upsample",
            factor=1.5,
            n_configs=20,
            racing_folds=5,
        )

        # Perform the evaluation based on cross-validation
        final_metrics = experiment.perform_evaluation()
        print(final_metrics)
        ```
    """

    def __init__(
        self,
        df: pd.DataFrame,
        task: str,
        learner: str,
        criterion: str,
        encoding: str,
        tuning: Optional[str],
        hpo: Optional[str],
        sampling: Optional[str] = None,
        factor: Optional[float] = None,
        n_configs: int = 10,
        racing_folds: Optional[int] = None,
        n_jobs: int = 1,
        cv_folds: Optional[int] = 10,
        test_seed: int = 0,
        test_size: float = 0.2,
        val_size: Optional[float] = 0.2,
        cv_seed: Optional[int] = 0,
        mlp_flag: Optional[bool] = None,
        threshold_tuning: Optional[bool] = None,
        verbose: bool = True,
    ) -> None:
        """Initialize the Experiment class with tuning parameters.

        Args:
            df (pd.DataFrame): The preloaded data for the experiment.
            task (str): The task name used to determine classification type.
                Can be 'pocketclosure', 'pocketclosureinf', 'improvement', or
                'pdgrouprevaluation'.
            learner (str): Specifies the model or algorithm to evaluate.
                Includes 'xgb', 'rf', 'lr' or 'mlp'.
            criterion (str): Criterion for optimization ('f1', 'macro_f1' or
                'brier_score').
            encoding (str): Encoding type for categorical features ('one_hot' or
                'binary').
            tuning (Optional[str]): Tuning method to apply ('holdout' or 'cv').
                Can be None.
            hpo (Optional[str]): Hyperparameter optimization method ('rs' or 'hebo').
                Can be None.
            sampling (Optional[str]): Resampling strategy to apply. Defaults to None.
                Includes None, 'upsampling', 'downsampling', and 'smote'.
            factor (Optional[float]): Resampling factor. Defaults to None.
            n_configs (int): Number of configurations for hyperparameter tuning.
                Defaults to 10.
            racing_folds (Optional[int]): Number of racing folds for Random Search (RS).
                Defaults to None.
            n_jobs (int): Number of parallel jobs to run for evaluation.
                Defaults to 1.
            cv_folds (Optional[int]): Number of folds for cross-validation;
                Defaults to 10.
            test_seed (int): Random seed for test splitting. Defaults to 0.
            test_size (float): Proportion of data used for testing. Defaults to
                0.2.
            val_size (Optional[float]): Size of validation set in holdout tuning.
                Defaults to 0.2.
            cv_seed (Optional[int]): Random seed for cross-validation. Defaults to 0
            mlp_flag (Optional[bool]): Flag to enable MLP training with early stopping.
                Defaults to None.
            threshold_tuning (Optional[bool]): If True, performs threshold tuning for
                binary classification if the criterion is "f1". Defaults to None.
            verbose (bool): Enables verbose output if set to True.
        """
        super().__init__(
            df=df,
            task=task,
            learner=learner,
            criterion=criterion,
            encoding=encoding,
            tuning=tuning,
            hpo=hpo,
            sampling=sampling,
            factor=factor,
            n_configs=n_configs,
            racing_folds=racing_folds,
            n_jobs=n_jobs,
            cv_folds=cv_folds,
            test_seed=test_seed,
            test_size=test_size,
            val_size=val_size,
            cv_seed=cv_seed,
            mlp_flag=mlp_flag,
            threshold_tuning=threshold_tuning,
            verbose=verbose,
        )

    def perform_evaluation(self) -> dict:
        """Perform model evaluation and return final metrics.

        Returns:
            dict: A dictionary containing the trained model and its evaluation metrics.
        """
        train_df, _ = self.resampler.split_train_test_df(
            df=self.df, seed=self.test_seed, test_size=self.test_size
        )

        if self.tuning == "holdout":
            return self._evaluate_holdout(train_df=train_df)
        elif self.tuning == "cv":
            return self._evaluate_cv()
        else:
            raise ValueError(f"Unsupported tuning method: {self.tuning}")

    def _evaluate_holdout(self, train_df: pd.DataFrame) -> dict:
        """Perform holdout validation and return the final model metrics.

        Args:
            train_df (pd.DataFrame): train df for holdout tuning.

        Returns:
            dict: A dictionary of evaluation metrics for the final model.
        """
        train_df_h, test_df_h = self.resampler.split_train_test_df(
            df=train_df, seed=self.test_seed, test_size=self.val_size
        )
        X_train_h, y_train_h, X_val, y_val = self.resampler.split_x_y(
            train_df=train_df_h,
            test_df=test_df_h,
            sampling=self.sampling,
            factor=self.factor,
        )
        best_params, best_threshold = self.tuner.holdout(
            learner=self.learner,
            X_train=X_train_h,
            y_train=y_train_h,
            X_val=X_val,
            y_val=y_val,
        )
        final_model = (self.learner, best_params, best_threshold)

        return self._train_final_model(final_model)

    def _evaluate_cv(self) -> dict:
        """Perform cross-validation and return the final model metrics.

        Returns:
            dict: A dictionary of evaluation metrics for the final model.
        """
        outer_splits, _ = self.resampler.cv_folds(
            df=self.df,
            sampling=self.sampling,
            factor=self.factor,
            seed=self.cv_seed,
            n_folds=self.cv_folds,
        )
        best_params, best_threshold = self.tuner.cv(
            learner=self.learner,
            outer_splits=outer_splits,
            racing_folds=self.racing_folds,
        )
        final_model = (self.learner, best_params, best_threshold)

        return self._train_final_model(final_model_tuple=final_model)

__init__(df, task, learner, criterion, encoding, tuning, hpo, sampling=None, factor=None, n_configs=10, racing_folds=None, n_jobs=1, cv_folds=10, test_seed=0, test_size=0.2, val_size=0.2, cv_seed=0, mlp_flag=None, threshold_tuning=None, verbose=True)

Initialize the Experiment class with tuning parameters.

Parameters:

Name Type Description Default
df DataFrame

The preloaded data for the experiment.

required
task str

The task name used to determine classification type. Can be 'pocketclosure', 'pocketclosureinf', 'improvement', or 'pdgrouprevaluation'.

required
learner str

Specifies the model or algorithm to evaluate. Includes 'xgb', 'rf', 'lr' or 'mlp'.

required
criterion str

Criterion for optimization ('f1', 'macro_f1' or 'brier_score').

required
encoding str

Encoding type for categorical features ('one_hot' or 'binary').

required
tuning Optional[str]

Tuning method to apply ('holdout' or 'cv'). Can be None.

required
hpo Optional[str]

Hyperparameter optimization method ('rs' or 'hebo'). Can be None.

required
sampling Optional[str]

Resampling strategy to apply. Defaults to None. Includes None, 'upsampling', 'downsampling', and 'smote'.

None
factor Optional[float]

Resampling factor. Defaults to None.

None
n_configs int

Number of configurations for hyperparameter tuning. Defaults to 10.

10
racing_folds Optional[int]

Number of racing folds for Random Search (RS). Defaults to None.

None
n_jobs int

Number of parallel jobs to run for evaluation. Defaults to 1.

1
cv_folds Optional[int]

Number of folds for cross-validation; Defaults to 10.

10
test_seed int

Random seed for test splitting. Defaults to 0.

0
test_size float

Proportion of data used for testing. Defaults to 0.2.

0.2
val_size Optional[float]

Size of validation set in holdout tuning. Defaults to 0.2.

0.2
cv_seed Optional[int]

Random seed for cross-validation. Defaults to 0

0
mlp_flag Optional[bool]

Flag to enable MLP training with early stopping. Defaults to None.

None
threshold_tuning Optional[bool]

If True, performs threshold tuning for binary classification if the criterion is "f1". Defaults to None.

None
verbose bool

Enables verbose output if set to True.

True
Source code in periomod/benchmarking/_benchmark.py
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
def __init__(
    self,
    df: pd.DataFrame,
    task: str,
    learner: str,
    criterion: str,
    encoding: str,
    tuning: Optional[str],
    hpo: Optional[str],
    sampling: Optional[str] = None,
    factor: Optional[float] = None,
    n_configs: int = 10,
    racing_folds: Optional[int] = None,
    n_jobs: int = 1,
    cv_folds: Optional[int] = 10,
    test_seed: int = 0,
    test_size: float = 0.2,
    val_size: Optional[float] = 0.2,
    cv_seed: Optional[int] = 0,
    mlp_flag: Optional[bool] = None,
    threshold_tuning: Optional[bool] = None,
    verbose: bool = True,
) -> None:
    """Initialize the Experiment class with tuning parameters.

    Args:
        df (pd.DataFrame): The preloaded data for the experiment.
        task (str): The task name used to determine classification type.
            Can be 'pocketclosure', 'pocketclosureinf', 'improvement', or
            'pdgrouprevaluation'.
        learner (str): Specifies the model or algorithm to evaluate.
            Includes 'xgb', 'rf', 'lr' or 'mlp'.
        criterion (str): Criterion for optimization ('f1', 'macro_f1' or
            'brier_score').
        encoding (str): Encoding type for categorical features ('one_hot' or
            'binary').
        tuning (Optional[str]): Tuning method to apply ('holdout' or 'cv').
            Can be None.
        hpo (Optional[str]): Hyperparameter optimization method ('rs' or 'hebo').
            Can be None.
        sampling (Optional[str]): Resampling strategy to apply. Defaults to None.
            Includes None, 'upsampling', 'downsampling', and 'smote'.
        factor (Optional[float]): Resampling factor. Defaults to None.
        n_configs (int): Number of configurations for hyperparameter tuning.
            Defaults to 10.
        racing_folds (Optional[int]): Number of racing folds for Random Search (RS).
            Defaults to None.
        n_jobs (int): Number of parallel jobs to run for evaluation.
            Defaults to 1.
        cv_folds (Optional[int]): Number of folds for cross-validation;
            Defaults to 10.
        test_seed (int): Random seed for test splitting. Defaults to 0.
        test_size (float): Proportion of data used for testing. Defaults to
            0.2.
        val_size (Optional[float]): Size of validation set in holdout tuning.
            Defaults to 0.2.
        cv_seed (Optional[int]): Random seed for cross-validation. Defaults to 0
        mlp_flag (Optional[bool]): Flag to enable MLP training with early stopping.
            Defaults to None.
        threshold_tuning (Optional[bool]): If True, performs threshold tuning for
            binary classification if the criterion is "f1". Defaults to None.
        verbose (bool): Enables verbose output if set to True.
    """
    super().__init__(
        df=df,
        task=task,
        learner=learner,
        criterion=criterion,
        encoding=encoding,
        tuning=tuning,
        hpo=hpo,
        sampling=sampling,
        factor=factor,
        n_configs=n_configs,
        racing_folds=racing_folds,
        n_jobs=n_jobs,
        cv_folds=cv_folds,
        test_seed=test_seed,
        test_size=test_size,
        val_size=val_size,
        cv_seed=cv_seed,
        mlp_flag=mlp_flag,
        threshold_tuning=threshold_tuning,
        verbose=verbose,
    )

perform_evaluation()

Perform model evaluation and return final metrics.

Returns:

Name Type Description
dict dict

A dictionary containing the trained model and its evaluation metrics.

Source code in periomod/benchmarking/_benchmark.py
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
def perform_evaluation(self) -> dict:
    """Perform model evaluation and return final metrics.

    Returns:
        dict: A dictionary containing the trained model and its evaluation metrics.
    """
    train_df, _ = self.resampler.split_train_test_df(
        df=self.df, seed=self.test_seed, test_size=self.test_size
    )

    if self.tuning == "holdout":
        return self._evaluate_holdout(train_df=train_df)
    elif self.tuning == "cv":
        return self._evaluate_cv()
    else:
        raise ValueError(f"Unsupported tuning method: {self.tuning}")