Skip to content

Baseline

Bases: BaseConfig

Evaluates baseline models on a given dataset.

This class loads, preprocesses, and evaluates a set of baseline models on a specified dataset. The baseline models include a Random Forest, Logistic Regression, and a Dummy Classifier, which are trained and evaluated on split data, returning a summary of performance metrics for each model.

Inherits
  • BaseConfig: Provides configuration settings for data processing.

Parameters:

Name Type Description Default
task str

Task name used to determine the classification type.

required
encoding str

Encoding type for categorical columns.

required
random_state int

Random seed for reproducibility. Defaults to 0.

0
lr_solver str

Solver used by Logistic Regression. Defaults to 'saga'.

'saga'
dummy_strategy str

Strategy for DummyClassifier, defaults to 'prior'.

'prior'
models List[Tuple[str, object]]

List of models to benchmark. If not provided, default models are initialized.

None
n_jobs int

Number of parallel jobs. Defaults to -1.

-1
path Path

Path to the directory containing processed data files. Defaults to Path("data/processed/processed_data.csv").

Path('data/processed/processed_data.csv')

Attributes:

Name Type Description
classification str

Specifies classification type ('binary' or 'multiclass') based on the task.

resampler Resampler

Strategy for resampling data during training/testing split.

dataloader ProcessedDataLoader

Loader for processing and transforming the dataset.

dummy_strategy str

Strategy used by the DummyClassifier, default is 'prior'.

lr_solver str

Solver for Logistic Regression, default is 'saga'.

random_state int

Random seed for reproducibility, default is 0.

models List[Tuple[str, object]]

List of models to benchmark, each represented as a tuple containing the model's name and the initialized model object.

path Path

Path to the directory containing processed data files.

Methods:

Name Description
train_baselines

Trains and returns baseline models with test data.

baseline

Trains and evaluates each model in the models list, returning a DataFrame with evaluation metrics.

Example
# Initialize baseline evaluation for pocket closure task
baseline = Baseline(
    task="pocketclosure",
    encoding="one_hot",
    random_state=42,
    lr_solver="saga",
    dummy_strategy="most_frequent"
)

# Evaluate baseline models and display results
results_df = baseline.baseline()
print(results_df)
Source code in periomod/benchmarking/_baseline.py
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
class Baseline(BaseConfig):
    """Evaluates baseline models on a given dataset.

    This class loads, preprocesses, and evaluates a set of baseline models on a
    specified dataset. The baseline models include a Random Forest, Logistic
    Regression, and a Dummy Classifier, which are trained and evaluated on
    split data, returning a summary of performance metrics for each model.

    Inherits:
        - `BaseConfig`: Provides configuration settings for data processing.

    Args:
        task (str): Task name used to determine the classification type.
        encoding (str): Encoding type for categorical columns.
        random_state (int, optional): Random seed for reproducibility. Defaults to 0.
        lr_solver (str, optional): Solver used by Logistic Regression. Defaults to
            'saga'.
        dummy_strategy (str, optional): Strategy for DummyClassifier, defaults to
            'prior'.
        models (List[Tuple[str, object]], optional): List of models to benchmark.
            If not provided, default models are initialized.
        n_jobs (int): Number of parallel jobs. Defaults to -1.
        path (Path): Path to the directory containing processed data files.
            Defaults to Path("data/processed/processed_data.csv").

    Attributes:
        classification (str): Specifies classification type ('binary' or
            'multiclass') based on the task.
        resampler (Resampler): Strategy for resampling data during training/testing
            split.
        dataloader (ProcessedDataLoader): Loader for processing and transforming the
            dataset.
        dummy_strategy (str): Strategy used by the DummyClassifier, default is 'prior'.
        lr_solver (str): Solver for Logistic Regression, default is 'saga'.
        random_state (int): Random seed for reproducibility, default is 0.
        models (List[Tuple[str, object]]): List of models to benchmark, each
            represented as a tuple containing the model's name and the initialized
            model object.
        path (Path): Path to the directory containing processed data files.

    Methods:
        train_baselines: Trains and returns baseline models with test data.
        baseline: Trains and evaluates each model in the models list, returning
            a DataFrame with evaluation metrics.

    Example:
        ```
        # Initialize baseline evaluation for pocket closure task
        baseline = Baseline(
            task="pocketclosure",
            encoding="one_hot",
            random_state=42,
            lr_solver="saga",
            dummy_strategy="most_frequent"
        )

        # Evaluate baseline models and display results
        results_df = baseline.baseline()
        print(results_df)
        ```
    """

    def __init__(
        self,
        task: str,
        encoding: str,
        random_state: int = 0,
        lr_solver: str = "saga",
        dummy_strategy: str = "prior",
        models: Union[List[Tuple[str, object]], None] = None,
        n_jobs: int = -1,
        path: Path = Path("data/processed/processed_data.csv"),
    ) -> None:
        """Initializes the Baseline class with default or user-specified models."""
        if task in ["pocketclosure", "pocketclosureinf", "improvement"]:
            self.classification = "binary"
        elif task == "pdgrouprevaluation":
            self.classification = "multiclass"
        else:
            raise ValueError(
                f"Unknown task: {task}. Unable to determine classification."
            )

        self.resampler = Resampler(
            classification=self.classification, encoding=encoding
        )
        self.dataloader = ProcessedDataLoader(task=task, encoding=encoding)
        self.dummy_strategy = dummy_strategy
        self.lr_solver = lr_solver
        self.random_state = random_state
        self.path = path
        self.default_models: Union[list[str], None]

        if models is None:
            self.models = [
                (
                    "Random Forest",
                    RandomForestClassifier(
                        n_jobs=n_jobs, random_state=self.random_state
                    ),
                ),
                (
                    "Logistic Regression",
                    LogisticRegression(
                        solver=self.lr_solver,
                        random_state=self.random_state,
                        n_jobs=n_jobs,
                    ),
                ),
                (
                    "Dummy Classifier",
                    DummyClassifier(strategy=self.dummy_strategy),
                ),
            ]
            self.default_models = [name for name, _ in self.models]
        else:
            self.models = models
            self.default_models = None

    @staticmethod
    def _bss_helper(
        results_df: pd.DataFrame, classification: str
    ) -> Tuple[pd.DataFrame, List[str]]:
        """Calculates Brier Skill Score (BSS) and determines column order.

        Args:
            results_df (pd.DataFrame): DataFrame containing evaluation metrics.
            classification (str): Classification type ('binary' or 'multiclass').

        Returns:
            Tuple[pd.DataFrame, List[str]]: Updated DataFrame with BSS and column order.
        """
        if classification == "binary":
            metric_column = "Brier Score"
            column_order = column_order_binary
        elif classification == "multiclass":
            metric_column = "Multiclass Brier Score"
            column_order = column_order_multiclass
            if "Class F1 Scores" in results_df.columns:
                results_df["Class F1 Scores"] = results_df["Class F1 Scores"].apply(
                    lambda scores: [round(score, 4) for score in scores]
                )
        else:
            raise ValueError(f"Unsupported classification type: {classification}")

        if metric_column in results_df.columns:
            dummy_brier = results_df.loc[
                results_df["Model"] == "Dummy Classifier", metric_column
            ].iloc[0]
            logreg_brier = results_df.loc[
                results_df["Model"] == "Logistic Regression", metric_column
            ].iloc[0]
            results_df["Brier Skill Score"] = results_df.apply(
                lambda row: _brier_skill_score(
                    row, dummy_brier, logreg_brier, metric_column
                ),
                axis=1,
            ).round(4)

        return results_df, column_order

    def train_baselines(
        self,
    ) -> Tuple[Dict[Tuple[str, str], Any], pd.DataFrame, pd.Series]:
        """Trains each model in the models list and returns related data splits.

        Returns:
            Tuple:
                - Dictionary containing trained models.
                - Testing feature set (X_test).
                - Testing labels (y_test).
        """
        df = self.dataloader.load_data(path=self.path)
        df = self.dataloader.transform_data(df=df)
        train_df, test_df = self.resampler.split_train_test_df(
            df=df, seed=self.random_state
        )
        X_train, y_train, X_test, y_test = self.resampler.split_x_y(
            train_df=train_df, test_df=test_df
        )

        trained_models = {}
        for model_name, model in self.models:
            model.fit(X_train, y_train)
            trained_models[(model_name, "Baseline")] = model

        return trained_models, X_test, y_test

    def baseline(self) -> pd.DataFrame:
        """Trains and evaluates each model in the models list on the given dataset.

        This method loads and transforms the dataset, splits it into training and
        testing sets, and evaluates each model in the `self.models` list. Metrics
        such as predictions and probabilities are computed and displayed.

        Returns:
            DataFrame: A DataFrame containing the evaluation metrics for each
                baseline model, with model names as row indices.
        """
        trained_models, X_test, y_test = self.train_baselines()
        results = []

        for model_name, model in self.models:
            preds = trained_models[(model_name, "Baseline")].predict(X_test)
            probs = (
                get_probs(
                    model=trained_models[(model_name, "Baseline")],
                    classification=self.classification,
                    X=X_test,
                )
                if hasattr(model, "predict_proba")
                else None
            )
            metrics = final_metrics(
                classification=self.classification,
                y=y_test,
                preds=preds,
                probs=probs,
            )
            metrics["Model"] = model_name
            results.append(metrics)

        results_df = pd.DataFrame(results).drop(
            columns=["Best Threshold"], errors="ignore"
        )

        results_df, column_order = self._bss_helper(
            results_df, classification=self.classification
        )

        existing_columns = [col for col in column_order if col in results_df.columns]
        results_df = results_df[
            existing_columns
            + [col for col in results_df.columns if col not in existing_columns]
        ].round(4)

        if self.default_models is not None:
            baseline_order = [
                "Dummy Classifier",
                "Logistic Regression",
                "Random Forest",
            ]
            results_df["Model"] = pd.Categorical(
                results_df["Model"], categories=baseline_order, ordered=True
            )
            results_df = results_df.sort_values("Model").reset_index(drop=True)

        else:
            results_df = results_df.reset_index(drop=True)
        pd.set_option("display.max_columns", None, "display.width", 1000)

        return results_df

__init__(task, encoding, random_state=0, lr_solver='saga', dummy_strategy='prior', models=None, n_jobs=-1, path=Path('data/processed/processed_data.csv'))

Initializes the Baseline class with default or user-specified models.

Source code in periomod/benchmarking/_baseline.py
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
def __init__(
    self,
    task: str,
    encoding: str,
    random_state: int = 0,
    lr_solver: str = "saga",
    dummy_strategy: str = "prior",
    models: Union[List[Tuple[str, object]], None] = None,
    n_jobs: int = -1,
    path: Path = Path("data/processed/processed_data.csv"),
) -> None:
    """Initializes the Baseline class with default or user-specified models."""
    if task in ["pocketclosure", "pocketclosureinf", "improvement"]:
        self.classification = "binary"
    elif task == "pdgrouprevaluation":
        self.classification = "multiclass"
    else:
        raise ValueError(
            f"Unknown task: {task}. Unable to determine classification."
        )

    self.resampler = Resampler(
        classification=self.classification, encoding=encoding
    )
    self.dataloader = ProcessedDataLoader(task=task, encoding=encoding)
    self.dummy_strategy = dummy_strategy
    self.lr_solver = lr_solver
    self.random_state = random_state
    self.path = path
    self.default_models: Union[list[str], None]

    if models is None:
        self.models = [
            (
                "Random Forest",
                RandomForestClassifier(
                    n_jobs=n_jobs, random_state=self.random_state
                ),
            ),
            (
                "Logistic Regression",
                LogisticRegression(
                    solver=self.lr_solver,
                    random_state=self.random_state,
                    n_jobs=n_jobs,
                ),
            ),
            (
                "Dummy Classifier",
                DummyClassifier(strategy=self.dummy_strategy),
            ),
        ]
        self.default_models = [name for name, _ in self.models]
    else:
        self.models = models
        self.default_models = None

baseline()

Trains and evaluates each model in the models list on the given dataset.

This method loads and transforms the dataset, splits it into training and testing sets, and evaluates each model in the self.models list. Metrics such as predictions and probabilities are computed and displayed.

Returns:

Name Type Description
DataFrame DataFrame

A DataFrame containing the evaluation metrics for each baseline model, with model names as row indices.

Source code in periomod/benchmarking/_baseline.py
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
def baseline(self) -> pd.DataFrame:
    """Trains and evaluates each model in the models list on the given dataset.

    This method loads and transforms the dataset, splits it into training and
    testing sets, and evaluates each model in the `self.models` list. Metrics
    such as predictions and probabilities are computed and displayed.

    Returns:
        DataFrame: A DataFrame containing the evaluation metrics for each
            baseline model, with model names as row indices.
    """
    trained_models, X_test, y_test = self.train_baselines()
    results = []

    for model_name, model in self.models:
        preds = trained_models[(model_name, "Baseline")].predict(X_test)
        probs = (
            get_probs(
                model=trained_models[(model_name, "Baseline")],
                classification=self.classification,
                X=X_test,
            )
            if hasattr(model, "predict_proba")
            else None
        )
        metrics = final_metrics(
            classification=self.classification,
            y=y_test,
            preds=preds,
            probs=probs,
        )
        metrics["Model"] = model_name
        results.append(metrics)

    results_df = pd.DataFrame(results).drop(
        columns=["Best Threshold"], errors="ignore"
    )

    results_df, column_order = self._bss_helper(
        results_df, classification=self.classification
    )

    existing_columns = [col for col in column_order if col in results_df.columns]
    results_df = results_df[
        existing_columns
        + [col for col in results_df.columns if col not in existing_columns]
    ].round(4)

    if self.default_models is not None:
        baseline_order = [
            "Dummy Classifier",
            "Logistic Regression",
            "Random Forest",
        ]
        results_df["Model"] = pd.Categorical(
            results_df["Model"], categories=baseline_order, ordered=True
        )
        results_df = results_df.sort_values("Model").reset_index(drop=True)

    else:
        results_df = results_df.reset_index(drop=True)
    pd.set_option("display.max_columns", None, "display.width", 1000)

    return results_df

train_baselines()

Trains each model in the models list and returns related data splits.

Returns:

Name Type Description
Tuple Tuple[Dict[Tuple[str, str], Any], DataFrame, Series]
  • Dictionary containing trained models.
  • Testing feature set (X_test).
  • Testing labels (y_test).
Source code in periomod/benchmarking/_baseline.py
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
def train_baselines(
    self,
) -> Tuple[Dict[Tuple[str, str], Any], pd.DataFrame, pd.Series]:
    """Trains each model in the models list and returns related data splits.

    Returns:
        Tuple:
            - Dictionary containing trained models.
            - Testing feature set (X_test).
            - Testing labels (y_test).
    """
    df = self.dataloader.load_data(path=self.path)
    df = self.dataloader.transform_data(df=df)
    train_df, test_df = self.resampler.split_train_test_df(
        df=df, seed=self.random_state
    )
    X_train, y_train, X_test, y_test = self.resampler.split_x_y(
        train_df=train_df, test_df=test_df
    )

    trained_models = {}
    for model_name, model in self.models:
        model.fit(X_train, y_train)
        trained_models[(model_name, "Baseline")] = model

    return trained_models, X_test, y_test