BaseBenchmark
Bases: BaseConfig
Base class for benchmarking models on specified tasks with various settings.
This class initializes common parameters for benchmarking, including task specifications, encoding and sampling methods, tuning strategies, and model evaluation criteria.
Inherits
BaseConfig
: Base configuration class providing configuration loading.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
task
|
str
|
Task for evaluation (pocketclosure', 'pocketclosureinf', 'improvement', or 'pdgrouprevaluation'.). |
required |
learners
|
List[str]
|
List of models or algorithms to benchmark, including 'xgb', 'rf', 'lr' or 'mlp'. |
required |
tuning_methods
|
List[str]
|
List of tuning methods for model training, such as 'holdout' or 'cv'. |
required |
hpo_methods
|
List[str]
|
Hyperparameter optimization strategies to apply, includes 'rs' and 'hebo'. |
required |
criteria
|
List[str]
|
List of evaluation criteria ('f1', 'macro_f1', 'brier_score'). |
required |
encodings
|
List[str]
|
Encoding types to transform categorical features, can either be 'one_hot' or 'target' encoding. |
required |
sampling
|
Optional[List[Union[str, None]]]
|
Sampling strategies to handle class imbalance, options include None, 'upsampling', 'downsampling', or 'smote'. |
required |
factor
|
Optional[float]
|
Factor specifying the amount of sampling to apply during resampling, if applicable. |
required |
n_configs
|
int
|
Number of configurations to evaluate in hyperparameter tuning. |
required |
n_jobs
|
int
|
Number of parallel jobs to use for processing; set to -1 to utilize all available cores. |
required |
cv_folds
|
int
|
Number of cross-validation folds for model training. Defaults to None. |
required |
racing_folds
|
Optional[int]
|
Number of racing folds to use in Random Search (rs) for optimized tuning. |
required |
test_seed
|
int
|
Random seed for reproducible train-test splits. |
required |
test_size
|
float
|
Fraction of the dataset to allocate to test set. |
required |
val_size
|
float
|
Fraction of the dataset to allocate to validation in a holdout setup. |
required |
cv_seed
|
int
|
Seed for cross-validation splitting. |
required |
mlp_flag
|
bool
|
If True, enables Multi-Layer Perceptron (MLP) training with early stopping. |
required |
threshold_tuning
|
bool
|
Enables decision threshold tuning for binary classification when optimizing for 'f1'. |
required |
verbose
|
bool
|
Enables detailed logging of processes if set to True. |
required |
path
|
Path
|
Directory path where processed data will be stored. |
required |
Attributes:
Name | Type | Description |
---|---|---|
task |
str
|
Task used for model classification or regression evaluation. |
learners |
List[str]
|
Selected models or algorithms for benchmarking. |
tuning_methods |
List[str]
|
List of model tuning approaches. |
hpo_methods |
List[str]
|
Hyperparameter optimization techniques to apply. |
criteria |
List[str]
|
Criteria used to evaluate model performance. |
encodings |
List[str]
|
Encoding methods applied to categorical features. |
sampling |
Optional[List[Union[str, None]]]
|
Sampling strategies employed to address class imbalance. |
factor |
Optional[float]
|
Specifies the degree of sampling applied within the chosen strategy. |
n_configs |
int
|
Number of configurations assessed during hyperparameter optimization. |
n_jobs |
int
|
Number of parallel processes for model training and evaluation. |
cv_folds |
int
|
Number of cross-validation folds for model training. |
racing_folds |
Optional[int]
|
Racing folds used in tuning with cross-validation and random search.. |
test_seed |
int
|
Seed for consistent test-train splitting. |
test_size |
float
|
Proportion of the data set aside for testing. |
val_size |
float
|
Proportion of data allocated to validation in holdout tuning. |
cv_seed |
int
|
Seed for cross-validation splitting. |
mlp_flag |
bool
|
Flag for MLP training with early stopping. |
threshold_tuning |
bool
|
Enables threshold adjustment for optimizing F1 in binary classification tasks. |
verbose |
bool
|
Flag to enable detailed logging during training and evaluation. |
path |
Path
|
Path where processed data is saved. |
Source code in periomod/benchmarking/_basebenchmark.py
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 |
|
__init__(task, learners, tuning_methods, hpo_methods, criteria, encodings, sampling, factor, n_configs, n_jobs, cv_folds, racing_folds, test_seed, test_size, val_size, cv_seed, mlp_flag, threshold_tuning, verbose, path)
¶
Initialize the base benchmark class with common parameters.
Source code in periomod/benchmarking/_basebenchmark.py
330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 |
|