BaseProcessor
Bases: BaseLoader
, ABC
Abstract base class defining essential data processing methods.
This class provides core processing capabilities such as loading and saving data, along with abstract methods that must be implemented by any subclass. These methods include data imputation, feature creation, and outcome variable generation for specialized data processing.
Inherits
BaseLoader
: Provides loading and saving capabilities for processed data.ABC
: Specifies abstract methods for subclasses to implement.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
behavior
|
bool
|
If True, includes behavior columns in the data processing. |
required |
Attributes:
Name | Type | Description |
---|---|---|
behavior |
bool
|
Flag indicating whether to include behavior columns during data processing. |
Methods:
Name | Description |
---|---|
load_data |
Load processed data from the specified path and file. |
save_data |
Save processed data to the specified path and file. |
Abstract Methods
impute_missing_values
: Impute missing values in the DataFrame.create_tooth_features
: Generate features related to tooth data.create_outcome_variables
: Create outcome variables for analysis.process_data
: Clean, impute, and scale the data.
Source code in periomod/data/_basedata.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 |
|
__init__(behavior)
¶
Initializes the BaseProcessor with behavior flag.
Source code in periomod/data/_basedata.py
100 101 102 103 |
|
create_outcome_variables(df)
abstractmethod
¶
Generates outcome variables for analysis.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
The DataFrame with original outcome variables. |
required |
Source code in periomod/data/_basedata.py
208 209 210 211 212 213 214 |
|
create_tooth_features(df)
abstractmethod
¶
Creates additional features related to tooth data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
The DataFrame containing tooth data. |
required |
Source code in periomod/data/_basedata.py
200 201 202 203 204 205 206 |
|
impute_missing_values(df)
abstractmethod
¶
Imputes missing values in the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
The DataFrame with potential missing values. |
required |
Source code in periomod/data/_basedata.py
192 193 194 195 196 197 198 |
|
load_data(path=Path('data/raw/raw_data.xlsx'))
¶
Loads the dataset and validates required columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Directory where dataset is located. Defaults to Path("data/raw/raw_data.xlsx"). |
Path('data/raw/raw_data.xlsx')
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: The loaded DataFrame. |
Raises:
Type | Description |
---|---|
ValueError
|
If any required columns are missing. |
Source code in periomod/data/_basedata.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
process_data(df)
abstractmethod
¶
Processes dataset with data cleaning, imputations and scaling.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
The input DataFrame. |
required |
Source code in periomod/data/_basedata.py
216 217 218 219 220 221 222 |
|
save_data(df, path=Path('data/processed/processed_data.csv'))
¶
Saves the processed DataFrame to a CSV file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
The processed DataFrame. |
required |
path
|
str
|
Directory where dataset is saved. Defaults to Path("data/processed/processed_data.csv". |
Path('data/processed/processed_data.csv')
|
Source code in periomod/data/_basedata.py
178 179 180 181 182 183 184 185 186 187 188 189 190 |
|