feat(ml): add TA-Lib annotation generation and import workflow

Add complete workflow for using TA-Lib to bootstrap training data: - generate_talib_annotations.py: Python script to run TA-Lib CDL* functions and output span annotations in UI-compatible format - import_talib_annotations.ts: TypeScript script to import generated annotations into the UI database with auto-label-type creation - npm script 'import-annotations' for easy execution - TALIB_WORKFLOW.md: Comprehensive guide covering the full cycle: * Generate patterns with TA-Lib * Import into UI * Review and edit in browser * Export and train model * Compare predictions with TA-Lib detections * Iterate for improvement This enables the intended workflow: use TA-Lib for initial annotations, manually refine them, then train a model that learns from corrections.
2026-02-15 19:18:28 +01:00 · 2026-02-15 19:18:28 +01:00 · 847ff67986
commit 847ff67986
parent 228f70daf3
18 changed files with 5416 additions and 7 deletions
--- a/TALIB_WORKFLOW.md
+++ b/TALIB_WORKFLOW.md
@ -0,0 +1,372 @@
+# TA-Lib Annotation Workflow
+
+This guide shows how to use TA-Lib to generate initial pattern annotations, edit them in the UI, and train a model.
+
+## Overview
+
+1. **Generate** - Run TA-Lib CDL* functions to detect patterns automatically
+2. **Import** - Import detected patterns into the UI database
+3. **Review & Edit** - View, correct, and refine annotations in the web UI
+4. **Train** - Export annotations and train your model
+5. **Iterate** - Get predictions, compare with TA-Lib, retrain
+
+## Step 1: Generate TA-Lib Annotations
+
+### 1.1 Prepare Your Data
+
+Export candles from your database:
+
+```bash
+# From host
+docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db -csv -header 'SELECT time, open, high, low, close, volume FROM candles ORDER BY time;'" > OHLCV.csv
+
+# Or copy your existing CSV
+cp your_data.csv OHLCV.csv
+```
+
+### 1.2 Run Pattern Detection
+
+Enter the ML service container and run the generator:
+
+```bash
+# Enter container
+docker-compose exec ml-service bash
+
+# Generate annotations (all patterns, perfect matches only)
+python generate_talib_annotations.py \
+  --input data/raw/OHLCV.csv \
+  --output talib_annotations.json
+
+# Or specify patterns and lower confidence threshold
+python generate_talib_annotations.py \
+  --input data/raw/OHLCV.csv \
+  --output talib_annotations.json \
+  --min-confidence 50 \
+  --patterns CDLENGULFING CDLHAMMER CDLDOJI
+
+# Exit container
+exit
+```
+
+This creates `talib_annotations.json` with detected patterns.
+
+### 1.3 Review Detection Results
+
+```bash
+# Check what was detected
+cat services/ml/talib_annotations.json | jq '.annotations | length'
+cat services/ml/talib_annotations.json | jq '.annotations[0]'
+
+# See pattern distribution
+cat services/ml/talib_annotations.json | jq '[.annotations[].label] | group_by(.) | map({label: .[0], count: length}) | sort_by(.count) | reverse'
+```
+
+**Output example:**
+```json
+{
+  "start_time": 1700000000,
+  "end_time": 1700003600,
+  "label": "Bullish Engulfing",
+  "confidence": 1.0,
+  "source": "programmatic",
+  "notes": "TA-Lib CDLENGULFING detection"
+}
+```
+
+## Step 2: Import into UI
+
+### 2.1 Copy Annotations File
+
+```bash
+# Copy from ML service to project root
+docker-compose cp ml-service:/app/talib_annotations.json ./talib_annotations.json
+```
+
+### 2.2 Get Your Chart ID
+
+Open http://localhost:3000 and note your chart ID (shown in the chart selector, or check database):
+
+```bash
+docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT id, name FROM charts;'"
+```
+
+### 2.3 Import Annotations
+
+```bash
+# Import into chart 1
+npm run import-annotations -- --file talib_annotations.json --chart-id 1
+
+# Or clear existing annotations first
+npm run import-annotations -- --file talib_annotations.json --chart-id 1 --clear
+```
+
+**Output:**
+```
+=== TA-Lib Annotation Import ===
+
+Input file: talib_annotations.json
+Chart ID: 1
+Clear existing: no
+
+Reading annotations file...
+Found 147 annotations
+Source: talib
+
+Ensuring 12 label types exist...
+  ✓ Bullish Engulfing (existing, id: 1)
+  + Bearish Engulfing (created, id: 8, color: #ef4444)
+  ✓ Bullish Hammer (existing, id: 2)
+  ...
+
+Importing 147 annotations for chart 1...
+
+✓ Imported 147 annotations
+
+=== Import Complete ===
+```
+
+## Step 3: Review & Edit in UI
+
+### 3.1 Open the Annotator
+
+1. Open http://localhost:3000
+2. Select your chart from the dropdown
+3. Scroll down to see the span annotations in the sidebar
+
+### 3.2 Review TA-Lib Detections
+
+You'll see all the TA-Lib detected patterns as span annotations:
+- Green spans = Bullish patterns
+- Red spans = Bearish patterns
+- Source labeled as "programmatic"
+
+### 3.3 Edit Annotations
+
+**Correct false positives:**
+- Click on a span annotation in the sidebar or chart
+- Press Delete/Backspace to remove it
+
+**Add missing patterns:**
+- Select a span label type
+- Click and drag on the chart to create new annotations
+- Your annotations are marked as source "human"
+
+**Adjust boundaries:**
+- Delete the annotation
+- Recreate it with correct start/end times
+
+**Add new pattern types:**
+- Go to "Manage Span Label Types"
+- Add custom patterns TA-Lib doesn't detect
+- Return to main page and annotate
+
+### 3.4 Best Practices
+
+- **Review all TA-Lib detections** - They're not always perfect
+- **Focus on quality over quantity** - Better to have 50 accurate annotations than 200 noisy ones
+- **Add context** - TA-Lib only detects classic patterns; add your own insights
+- **Diverse examples** - Make sure you have patterns in different market conditions
+
+## Step 4: Export & Train
+
+### 4.1 Export Annotations
+
+```bash
+# Export all span annotations (includes both human and TA-Lib)
+curl http://localhost:3000/api/span-annotations/export > services/ml/data/annotations/export.json
+
+# Verify export
+cat services/ml/data/annotations/export.json | jq '.annotations | length'
+```
+
+### 4.2 Prepare OHLCV Data
+
+```bash
+# Copy candles to ML service
+docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db -csv -header 'SELECT time, open, high, low, close, volume FROM candles ORDER BY time;'" > services/ml/data/raw/OHLCV.csv
+```
+
+### 4.3 Train Model
+
+```bash
+# Enter ML service
+docker-compose exec ml-service bash
+
+# Run full pipeline
+python pipeline.py --config config/pipeline.yaml
+
+# Exit
+exit
+```
+
+### 4.4 Restart Inference
+
+```bash
+# Restart to load new model
+docker-compose restart ml-service
+
+# Verify model loaded
+curl http://localhost:8001/model/info | jq '.model_info'
+```
+
+## Step 5: Compare & Iterate
+
+### 5.1 Get Predictions
+
+1. Open http://localhost:3000
+2. Scroll to Predictions panel
+3. Click "Run on Visible" or "Predict All"
+
+### 5.2 Compare with TA-Lib
+
+Now you can compare:
+- **Your edited annotations** (human judgment)
+- **TA-Lib raw detections** (programmatic)
+- **Model predictions** (trained on your corrections)
+
+The disagreement detection shows where these differ!
+
+### 5.3 Iterate
+
+Use the prediction summary to find:
+- **Missed by model** - Patterns you annotated but model missed
+- **Missed by human** - Model found patterns you didn't annotate
+- **Label mismatch** - Same location, different pattern type
+
+Add more annotations where the model struggles, then retrain.
+
+## Configuration Options
+
+### Pattern Selection
+
+Edit which patterns to detect:
+
+```bash
+python generate_talib_annotations.py \
+  --input data/raw/OHLCV.csv \
+  --output talib_annotations.json \
+  --patterns CDLENGULFING CDLHAMMER CDLDOJI CDLMORNINGSTAR CDLEVENINGSTAR
+```
+
+Common patterns:
+- `CDLENGULFING` - Bullish/Bearish Engulfing
+- `CDLHAMMER` - Hammer
+- `CDLDOJI` - Doji
+- `CDLMORNINGSTAR` / `CDLEVENINGSTAR` - Morning/Evening Star
+- `CDLHARAMI` - Harami
+- `CDLTHREEWHITESOLDIERS` / `CDLTHREEBLACKCROWS` - Three Soldiers/Crows
+
+See full list: https://ta-lib.org/function.html (search for CDL)
+
+### Confidence Threshold
+
+TA-Lib returns -100/+100 for pattern matches. Lower the threshold to get more detections:
+
+```bash
+# Get 50-100% matches (more patterns, potentially noisier)
+python generate_talib_annotations.py \
+  --input data/raw/OHLCV.csv \
+  --output talib_annotations.json \
+  --min-confidence 50
+```
+
+## Troubleshooting
+
+### "No module named 'talib'"
+
+TA-Lib not installed:
+
+```bash
+# Rebuild ml-service with TA-Lib
+docker-compose build --no-cache ml-service
+docker-compose up -d ml-service
+```
+
+### "No patterns detected"
+
+Try:
+1. **Lower confidence threshold** - Use `--min-confidence 50`
+2. **Check data quality** - Make sure OHLCV has valid data
+3. **Try more patterns** - Don't specify `--patterns`, detect all
+
+### Import script fails
+
+Make sure:
+1. **File exists** - Check path to JSON file
+2. **Chart ID valid** - Run: `docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT id, name FROM charts;'"`
+3. **tsx installed** - Run: `npm install`
+
+### Annotations not showing in UI
+
+1. **Refresh page** - Hard refresh (Ctrl+F5)
+2. **Check chart ID** - Make sure you selected the correct chart
+3. **Check database** - Run: `docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT COUNT(*) FROM span_annotations;'"`
+
+## Tips & Best Practices
+
+### Balancing Human & Programmatic Labels
+
+When training, you can choose merge strategy in `config/pipeline.yaml`:
+
+```yaml
+annotation_ingestion:
+  merge_strategy: "human_priority"  # Use human labels where they overlap
+  # or "programmatic_priority"      # Use TA-Lib where they overlap
+  # or "both"                       # Keep both as separate features
+```
+
+**Recommended**: Start with `human_priority` - trust your corrections over TA-Lib.
+
+### Iterative Improvement
+
+1. **Round 1**: Generate TA-Lib → Review → Train baseline model
+2. **Round 2**: Get predictions → Find disagreements → Add corrections → Retrain
+3. **Round 3**: Focus on low-confidence predictions → Add more examples → Retrain
+4. **Repeat** until model performance meets your needs
+
+### Pattern Coverage
+
+Make sure you have examples of:
+- **Bullish patterns** in uptrends
+- **Bearish patterns** in downtrends
+- **Neutral patterns** in sideways markets
+- **False signals** (TA-Lib detected but actually not a tradeable pattern)
+
+This teaches the model context, not just shape recognition.
+
+### Quality Metrics
+
+Track these in MLflow UI (http://localhost:5000):
+- **Accuracy** - Overall correctness
+- **F1 (macro)** - Average across all pattern types
+- **Per-class F1** - Performance for each pattern individually
+- **Confusion matrix** - Where the model makes mistakes
+
+Focus on improving F1 for patterns you actually trade.
+
+## Quick Reference
+
+```bash
+# Generate TA-Lib annotations
+docker-compose exec ml-service python generate_talib_annotations.py \
+  --input data/raw/OHLCV.csv --output talib_annotations.json
+
+# Copy to host
+docker-compose cp ml-service:/app/talib_annotations.json ./
+
+# Import to UI
+npm run import-annotations -- --file talib_annotations.json --chart-id 1
+
+# Export after editing
+curl http://localhost:3000/api/span-annotations/export > services/ml/data/annotations/export.json
+
+# Train model
+docker-compose exec ml-service python pipeline.py --config config/pipeline.yaml
+
+# Restart inference
+docker-compose restart ml-service
+
+# View results
+open http://localhost:3000
+open http://localhost:5000  # MLflow UI
+```