ProcessedDataLoader
Bases: BaseDataLoader
Concrete data loader for loading, transforming, and saving processed data.
This class implements methods for encoding categorical columns, scaling numeric columns, and transforming data based on the specified task. It supports encoding types such as 'one_hot' and 'target', with optional scaling of numeric columns.
Inherits
BaseDataLoader
: Provides core data loading, encoding, and scaling methods.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
task
|
str
|
The task column name, used to guide specific transformations. Can be 'pocketclosure', 'pocketclosureinf', 'improvement', or 'pdgrouprevaluation'. |
required |
encoding
|
Optional[str]
|
Specifies the encoding method for categorical columns. Options include 'one_hot', 'target', or None. Defaults to None. |
None
|
encode
|
bool
|
If True, applies encoding to categorical columns. Defaults to True. |
True
|
scale
|
bool
|
If True, applies scaling to numeric columns. Defaults to True. |
True
|
Attributes:
Name | Type | Description |
---|---|---|
task |
str
|
Task column name used during data transformations. Can be 'pocketclosure', 'pocketclosureinf', 'improvement', or 'pdgrouprevaluation'. |
encoding |
str
|
Encoding method specified for categorical columns. Options include 'one_hot' or 'target'. |
encode |
bool
|
Flag to enable encoding of categorical columns. |
scale |
bool
|
Flag to enable scaling of numeric columns. |
Methods:
Name | Description |
---|---|
encode_categorical_columns |
Encodes categorical columns based on the specified encoding method. |
scale_numeric_columns |
Scales numeric columns to normalize data. |
transform_data |
Executes the complete data processing pipeline, including encoding and scaling. |
Inherited Methods
load_data
: Load processed data from the specified path and file.save_data
: Save processed data to the specified path and file.
Example
from periomod.data import ProcessedDataLoader
# instantiate with one-hot encoding and scale numerical variables
dataloader = ProcessedDataLoader(
task="pocketclosure", encoding="one_hot", encode=True, scale=True
)
df = dataloader.load_data(path="data/processed/processed_data.csv")
df = dataloader.transform_data(df=df)
dataloader.save_data(df=df, path="data/training/training_data.csv")
Source code in periomod/data/_dataloader.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
|
__init__(task, encoding=None, encode=True, scale=True)
¶
Initializes the ProcessedDataLoader with the specified task column.
Source code in periomod/data/_dataloader.py
63 64 65 66 67 68 69 70 71 |
|
encode_categorical_columns(df)
¶
Encodes categorical columns in the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
The DataFrame containing categorical columns. |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
The DataFrame with encoded categorical columns. |
Raises:
Type | Description |
---|---|
ValueError
|
If an invalid encoding type is specified. |
Source code in periomod/data/_dataloader.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
|
scale_numeric_columns(df)
¶
Scales numeric columns in the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
The DataFrame containing numeric columns. |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
The DataFrame with scaled numeric columns. |
Source code in periomod/data/_dataloader.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
|
transform_data(df)
¶
Select task column and optionally, scale and encode.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
The DataFrame to transform. |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
DataFrame with the selected task 'y'. |
Source code in periomod/data/_dataloader.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
|