Commit graph

5 commits

Author SHA1 Message Date
Marko Djordjevic
26f2c44509 Fix XGBoost label encoding and single-class guard 2026-02-18 23:58:24 +01:00
Marko Djordjevic
73c10a4156 Fix inference feature mismatch with training metadata 2026-02-18 23:53:38 +01:00
Marko Djordjevic
aa81d4f3d0 fix(ml): complete ML pipeline fixes and setup
- Fix CCI indicator to use HLC prices instead of close only
- Parse datetime column when loading enriched CSV
- Strip timezone from annotation timestamps
- Fix TA-Lib pattern names (CDL3WHITESOLDIERS, CDL3BLACKCROWS)
- Exclude programmatic label columns from training features
- Fix classification report to handle missing classes
- Update MLflow tracking to use localhost:5000
- Grant PostgreSQL permissions to ml_user

Pipeline now runs successfully end-to-end:
- Feature engineering: 2543 rows, 31 columns
- Annotation ingestion: 286 samples
- Training: 89.47% test accuracy with Random Forest
2026-02-15 21:29:54 +01:00
Marko Djordjevic
ceb4103ec4 fix(ml): parse datetime column and fix TA-Lib pattern names
- Add parse_dates parameter when loading enriched CSV
- Strip timezone from annotation timestamps to match data
- Fix pattern names: CDLTHREEWHITESOLDIERS -> CDL3WHITESOLDIERS
- Fix pattern names: CDLTHREEBLACKCROWS -> CDL3BLACKCROWS
2026-02-15 21:13:20 +01:00
Marko Djordjevic
f4c0f9a836 feat(ml): implement training stage with MLflow tracking and model wrappers
- Create RandomForestModel and XGBoostModel wrappers with class weight support
- Implement temporal and random train/val/test splitting
- Add MLflow experiment tracking with full parameter and metric logging
- Create evaluation module for confusion matrix, feature importance, and classification reports
- Implement model training with sklearn/xgboost flavor logging and optional registry registration
- Store training run metadata in PostgreSQL
- Wire training stage into pipeline.py orchestrator
- Support both RandomForest and XGBoost models with configurable hyperparameters
2026-02-15 14:22:19 +01:00