feat(ml): implement training stage with MLflow tracking and model wrappers

- Create RandomForestModel and XGBoostModel wrappers with class weight support
- Implement temporal and random train/val/test splitting
- Add MLflow experiment tracking with full parameter and metric logging
- Create evaluation module for confusion matrix, feature importance, and classification reports
- Implement model training with sklearn/xgboost flavor logging and optional registry registration
- Store training run metadata in PostgreSQL
- Wire training stage into pipeline.py orchestrator
- Support both RandomForest and XGBoost models with configurable hyperparameters
This commit is contained in:
Marko Djordjevic 2026-02-15 14:22:19 +01:00
parent 16763b967e
commit f4c0f9a836
8 changed files with 900 additions and 14 deletions

View file

@ -36,18 +36,18 @@
## 5. Training Stage
- [ ] 5.1 Create `services/ml/training/train.py` — main training entry point: load labeled CSV, split, train, evaluate, log to MLflow
- [ ] 5.2 Implement temporal train/validation/test splitting with configurable ratios, warn on random split
- [ ] 5.3 Create `services/ml/training/models/random_forest.py` — RandomForestClassifier wrapper with class_weights support
- [ ] 5.4 Create `services/ml/training/models/xgboost_model.py` — XGBClassifier wrapper with class_weights support
- [ ] 5.5 Implement model dispatch — select model class based on `model_type` config, fail with supported types list for unknown types
- [ ] 5.6 Implement MLflow experiment tracking — create run, log config artifact, dataset params, per-class sample counts, all hyperparameters
- [ ] 5.7 Implement metrics logging — accuracy, f1_macro, f1_weighted, per-class precision/recall/F1
- [ ] 5.8 Create `services/ml/training/evaluation.py` — generate confusion matrix plot, feature importance plot, classification report text
- [ ] 5.9 Implement MLflow artifact logging — log confusion_matrix.png, feature_importance.png, classification_report.txt, pipeline_config.yaml
- [ ] 5.10 Implement MLflow model registration — log model with sklearn/xgboost flavor, register in registry if configured
- [ ] 5.11 Store training run metadata in PostgreSQL `training_runs` table
- [ ] 5.12 Wire training into `pipeline.py`
- [x] 5.1 Create `services/ml/training/train.py` — main training entry point: load labeled CSV, split, train, evaluate, log to MLflow
- [x] 5.2 Implement temporal train/validation/test splitting with configurable ratios, warn on random split
- [x] 5.3 Create `services/ml/training/models/random_forest.py` — RandomForestClassifier wrapper with class_weights support
- [x] 5.4 Create `services/ml/training/models/xgboost_model.py` — XGBClassifier wrapper with class_weights support
- [x] 5.5 Implement model dispatch — select model class based on `model_type` config, fail with supported types list for unknown types
- [x] 5.6 Implement MLflow experiment tracking — create run, log config artifact, dataset params, per-class sample counts, all hyperparameters
- [x] 5.7 Implement metrics logging — accuracy, f1_macro, f1_weighted, per-class precision/recall/F1
- [x] 5.8 Create `services/ml/training/evaluation.py` — generate confusion matrix plot, feature importance plot, classification report text
- [x] 5.9 Implement MLflow artifact logging — log confusion_matrix.png, feature_importance.png, classification_report.txt, pipeline_config.yaml
- [x] 5.10 Implement MLflow model registration — log model with sklearn/xgboost flavor, register in registry if configured
- [x] 5.11 Store training run metadata in PostgreSQL `training_runs` table
- [x] 5.12 Wire training into `pipeline.py`
## 6. Inference Service (FastAPI)