feat(ml): implement annotation ingestion with windowed/BIO encoding and TA-Lib patterns
This commit is contained in:
parent
fd29ab91e0
commit
16763b967e
3 changed files with 541 additions and 10 deletions
|
|
@ -25,14 +25,14 @@
|
|||
|
||||
## 4. Annotation Ingestion Stage
|
||||
|
||||
- [ ] 4.1 Create `services/ml/app/annotation_ingestion.py` — load annotations JSON from `data.annotations_path`, filter by min_confidence
|
||||
- [ ] 4.2 Implement windowed classification encoding — extract fixed-size windows centered on each annotation span, flatten into single rows, handle boundary padding
|
||||
- [ ] 4.3 Implement BIO sequence labeling encoding — assign B-{label}/I-{label}/O tags per candle, handle overlapping annotations with multiple tag columns
|
||||
- [ ] 4.4 Implement TA-Lib CDL* programmatic labeling — run configured CDL functions, convert +100/-100 to label names (bullish_/bearish_ prefix)
|
||||
- [ ] 4.5 Implement human/programmatic label merge strategies — human_priority, programmatic_priority, both (separate columns)
|
||||
- [ ] 4.6 Implement context padding — include N candles before/after each annotation span
|
||||
- [ ] 4.7 Add dataset statistics logging — counts per label, class distribution %, avg span length, human/programmatic agreement rate
|
||||
- [ ] 4.8 Wire annotation ingestion into `pipeline.py` — read enriched CSV + annotations JSON, run encoding, write labeled CSV to `data.labeled_path`
|
||||
- [x] 4.1 Create `services/ml/app/annotation_ingestion.py` — load annotations JSON from `data.annotations_path`, filter by min_confidence
|
||||
- [x] 4.2 Implement windowed classification encoding — extract fixed-size windows centered on each annotation span, flatten into single rows, handle boundary padding
|
||||
- [x] 4.3 Implement BIO sequence labeling encoding — assign B-{label}/I-{label}/O tags per candle, handle overlapping annotations with multiple tag columns
|
||||
- [x] 4.4 Implement TA-Lib CDL* programmatic labeling — run configured CDL functions, convert +100/-100 to label names (bullish_/bearish_ prefix)
|
||||
- [x] 4.5 Implement human/programmatic label merge strategies — human_priority, programmatic_priority, both (separate columns)
|
||||
- [x] 4.6 Implement context padding — include N candles before/after each annotation span
|
||||
- [x] 4.7 Add dataset statistics logging — counts per label, class distribution %, avg span length, human/programmatic agreement rate
|
||||
- [x] 4.8 Wire annotation ingestion into `pipeline.py` — read enriched CSV + annotations JSON, run encoding, write labeled CSV to `data.labeled_path`
|
||||
|
||||
## 5. Training Stage
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue