feat(ml): implement feature engineering pipeline

- Create pipeline.py with CLI argument parsing for running stages
- Implement TA-Lib indicator computation with multi-output support
- Add candle feature extraction (body_size, wicks, ratios, etc.)
- Create custom feature loader with dynamic module import
- Wire all feature engineering stages with NaN handling
- Tasks completed: 2.2, 2.3, 3.1, 3.2, 3.3, 3.4, 3.5
This commit is contained in:
Marko Djordjevic 2026-02-15 12:22:59 +01:00
parent ea339a54a7
commit fd29ab91e0
6 changed files with 889 additions and 7 deletions

View file

@ -12,16 +12,16 @@
## 2. Pipeline Config & Entry Point
- [x] 2.1 Create `services/ml/app/config.py` — Pydantic model for pipeline YAML config with validation (stages, data paths, hyperparameters)
- [ ] 2.2 Create `services/ml/pipeline.py` — main orchestrator that reads config and runs enabled stages in sequence
- [ ] 2.3 Add CLI argument parsing: `--config`, `--stage` (run individual stage), support for `python pipeline.py --config config/pipeline.yaml`
- [x] 2.2 Create `services/ml/pipeline.py` — main orchestrator that reads config and runs enabled stages in sequence
- [x] 2.3 Add CLI argument parsing: `--config`, `--stage` (run individual stage), support for `python pipeline.py --config config/pipeline.yaml`
## 3. Feature Engineering Stage
- [ ] 3.1 Create `services/ml/features/talib_features.py` — compute TA-Lib indicators from config list, append columns with `{indicator}_{param}` naming, fail with clear error if TA-Lib not installed
- [ ] 3.2 Create `services/ml/features/candle_features.py` — compute body_size, body_direction, upper_wick, lower_wick, wick_ratio, body_to_range, gap, range with division-by-zero handling
- [ ] 3.3 Create `services/ml/features/custom_loader.py` — dynamic import of custom feature functions from config paths, call with DataFrame, append result as column
- [ ] 3.4 Implement NaN warmup row handling — drop rows with NaN in indicator columns, log count of dropped rows
- [ ] 3.5 Wire feature engineering into `pipeline.py` — read raw OHLCV CSV, run enabled feature steps, write enriched CSV to `data.enriched_path`
- [x] 3.1 Create `services/ml/features/talib_features.py` — compute TA-Lib indicators from config list, append columns with `{indicator}_{param}` naming, fail with clear error if TA-Lib not installed
- [x] 3.2 Create `services/ml/features/candle_features.py` — compute body_size, body_direction, upper_wick, lower_wick, wick_ratio, body_to_range, gap, range with division-by-zero handling
- [x] 3.3 Create `services/ml/features/custom_loader.py` — dynamic import of custom feature functions from config paths, call with DataFrame, append result as column
- [x] 3.4 Implement NaN warmup row handling — drop rows with NaN in indicator columns, log count of dropped rows
- [x] 3.5 Wire feature engineering into `pipeline.py` — read raw OHLCV CSV, run enabled feature steps, write enriched CSV to `data.enriched_path`
## 4. Annotation Ingestion Stage