# TA-Lib Annotation Workflow This guide shows how to use TA-Lib to generate initial pattern annotations, edit them in the UI, and train a model. ## Overview 1. **Generate** - Run TA-Lib CDL* functions to detect patterns automatically 2. **Import** - Import detected patterns into the UI database 3. **Review & Edit** - View, correct, and refine annotations in the web UI 4. **Train** - Export annotations and train your model 5. **Iterate** - Get predictions, compare with TA-Lib, retrain ## Step 1: Generate TA-Lib Annotations ### 1.1 Prepare Your Data Export candles from your database: ```bash # From host docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db -csv -header 'SELECT time, open, high, low, close, volume FROM candles ORDER BY time;'" > OHLCV.csv # Or copy your existing CSV cp your_data.csv OHLCV.csv ``` ### 1.2 Run Pattern Detection Enter the ML service container and run the generator: ```bash # Enter container docker-compose exec ml-service bash # Generate annotations (all patterns, perfect matches only) python generate_talib_annotations.py \ --input data/raw/OHLCV.csv \ --output talib_annotations.json # Or specify patterns and lower confidence threshold python generate_talib_annotations.py \ --input data/raw/OHLCV.csv \ --output talib_annotations.json \ --min-confidence 50 \ --patterns CDLENGULFING CDLHAMMER CDLDOJI # Exit container exit ``` This creates `talib_annotations.json` with detected patterns. ### 1.3 Review Detection Results ```bash # Check what was detected cat services/ml/talib_annotations.json | jq '.annotations | length' cat services/ml/talib_annotations.json | jq '.annotations[0]' # See pattern distribution cat services/ml/talib_annotations.json | jq '[.annotations[].label] | group_by(.) | map({label: .[0], count: length}) | sort_by(.count) | reverse' ``` **Output example:** ```json { "start_time": 1700000000, "end_time": 1700003600, "label": "Bullish Engulfing", "confidence": 1.0, "source": "programmatic", "notes": "TA-Lib CDLENGULFING detection" } ``` ## Step 2: Import into UI ### 2.1 Copy Annotations File ```bash # Copy from ML service to project root docker-compose cp ml-service:/app/talib_annotations.json ./talib_annotations.json ``` ### 2.2 Get Your Chart ID Open http://localhost:3000 and note your chart ID (shown in the chart selector, or check database): ```bash docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT id, name FROM charts;'" ``` ### 2.3 Import Annotations ```bash # Import into chart 1 npm run import-annotations -- --file talib_annotations.json --chart-id 1 # Or clear existing annotations first npm run import-annotations -- --file talib_annotations.json --chart-id 1 --clear ``` **Output:** ``` === TA-Lib Annotation Import === Input file: talib_annotations.json Chart ID: 1 Clear existing: no Reading annotations file... Found 147 annotations Source: talib Ensuring 12 label types exist... ✓ Bullish Engulfing (existing, id: 1) + Bearish Engulfing (created, id: 8, color: #ef4444) ✓ Bullish Hammer (existing, id: 2) ... Importing 147 annotations for chart 1... ✓ Imported 147 annotations === Import Complete === ``` ## Step 3: Review & Edit in UI ### 3.1 Open the Annotator 1. Open http://localhost:3000 2. Select your chart from the dropdown 3. Scroll down to see the span annotations in the sidebar ### 3.2 Review TA-Lib Detections You'll see all the TA-Lib detected patterns as span annotations: - Green spans = Bullish patterns - Red spans = Bearish patterns - Source labeled as "programmatic" ### 3.3 Edit Annotations **Correct false positives:** - Click on a span annotation in the sidebar or chart - Press Delete/Backspace to remove it **Add missing patterns:** - Select a span label type - Click and drag on the chart to create new annotations - Your annotations are marked as source "human" **Adjust boundaries:** - Delete the annotation - Recreate it with correct start/end times **Add new pattern types:** - Go to "Manage Span Label Types" - Add custom patterns TA-Lib doesn't detect - Return to main page and annotate ### 3.4 Best Practices - **Review all TA-Lib detections** - They're not always perfect - **Focus on quality over quantity** - Better to have 50 accurate annotations than 200 noisy ones - **Add context** - TA-Lib only detects classic patterns; add your own insights - **Diverse examples** - Make sure you have patterns in different market conditions ## Step 4: Export & Train ### 4.1 Export Annotations ```bash # Export all span annotations (includes both human and TA-Lib) curl http://localhost:3000/api/span-annotations/export > services/ml/data/annotations/export.json # Verify export cat services/ml/data/annotations/export.json | jq '.annotations | length' ``` ### 4.2 Prepare OHLCV Data ```bash # Copy candles to ML service docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db -csv -header 'SELECT time, open, high, low, close, volume FROM candles ORDER BY time;'" > services/ml/data/raw/OHLCV.csv ``` ### 4.3 Train Model ```bash # Enter ML service docker-compose exec ml-service bash # Run full pipeline python pipeline.py --config config/pipeline.yaml # Exit exit ``` ### 4.4 Restart Inference ```bash # Restart to load new model docker-compose restart ml-service # Verify model loaded curl http://localhost:8001/model/info | jq '.model_info' ``` ## Step 5: Compare & Iterate ### 5.1 Get Predictions 1. Open http://localhost:3000 2. Scroll to Predictions panel 3. Click "Run on Visible" or "Predict All" ### 5.2 Compare with TA-Lib Now you can compare: - **Your edited annotations** (human judgment) - **TA-Lib raw detections** (programmatic) - **Model predictions** (trained on your corrections) The disagreement detection shows where these differ! ### 5.3 Iterate Use the prediction summary to find: - **Missed by model** - Patterns you annotated but model missed - **Missed by human** - Model found patterns you didn't annotate - **Label mismatch** - Same location, different pattern type Add more annotations where the model struggles, then retrain. ## Configuration Options ### Pattern Selection Edit which patterns to detect: ```bash python generate_talib_annotations.py \ --input data/raw/OHLCV.csv \ --output talib_annotations.json \ --patterns CDLENGULFING CDLHAMMER CDLDOJI CDLMORNINGSTAR CDLEVENINGSTAR ``` Common patterns: - `CDLENGULFING` - Bullish/Bearish Engulfing - `CDLHAMMER` - Hammer - `CDLDOJI` - Doji - `CDLMORNINGSTAR` / `CDLEVENINGSTAR` - Morning/Evening Star - `CDLHARAMI` - Harami - `CDLTHREEWHITESOLDIERS` / `CDLTHREEBLACKCROWS` - Three Soldiers/Crows See full list: https://ta-lib.org/function.html (search for CDL) ### Confidence Threshold TA-Lib returns -100/+100 for pattern matches. Lower the threshold to get more detections: ```bash # Get 50-100% matches (more patterns, potentially noisier) python generate_talib_annotations.py \ --input data/raw/OHLCV.csv \ --output talib_annotations.json \ --min-confidence 50 ``` ## Troubleshooting ### "No module named 'talib'" TA-Lib not installed: ```bash # Rebuild ml-service with TA-Lib docker-compose build --no-cache ml-service docker-compose up -d ml-service ``` ### "No patterns detected" Try: 1. **Lower confidence threshold** - Use `--min-confidence 50` 2. **Check data quality** - Make sure OHLCV has valid data 3. **Try more patterns** - Don't specify `--patterns`, detect all ### Import script fails Make sure: 1. **File exists** - Check path to JSON file 2. **Chart ID valid** - Run: `docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT id, name FROM charts;'"` 3. **tsx installed** - Run: `npm install` ### Annotations not showing in UI 1. **Refresh page** - Hard refresh (Ctrl+F5) 2. **Check chart ID** - Make sure you selected the correct chart 3. **Check database** - Run: `docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT COUNT(*) FROM span_annotations;'"` ## Tips & Best Practices ### Balancing Human & Programmatic Labels When training, you can choose merge strategy in `config/pipeline.yaml`: ```yaml annotation_ingestion: merge_strategy: "human_priority" # Use human labels where they overlap # or "programmatic_priority" # Use TA-Lib where they overlap # or "both" # Keep both as separate features ``` **Recommended**: Start with `human_priority` - trust your corrections over TA-Lib. ### Iterative Improvement 1. **Round 1**: Generate TA-Lib → Review → Train baseline model 2. **Round 2**: Get predictions → Find disagreements → Add corrections → Retrain 3. **Round 3**: Focus on low-confidence predictions → Add more examples → Retrain 4. **Repeat** until model performance meets your needs ### Pattern Coverage Make sure you have examples of: - **Bullish patterns** in uptrends - **Bearish patterns** in downtrends - **Neutral patterns** in sideways markets - **False signals** (TA-Lib detected but actually not a tradeable pattern) This teaches the model context, not just shape recognition. ### Quality Metrics Track these in MLflow UI (http://localhost:5000): - **Accuracy** - Overall correctness - **F1 (macro)** - Average across all pattern types - **Per-class F1** - Performance for each pattern individually - **Confusion matrix** - Where the model makes mistakes Focus on improving F1 for patterns you actually trade. ## Quick Reference ```bash # Generate TA-Lib annotations docker-compose exec ml-service python generate_talib_annotations.py \ --input data/raw/OHLCV.csv --output talib_annotations.json # Copy to host docker-compose cp ml-service:/app/talib_annotations.json ./ # Import to UI npm run import-annotations -- --file talib_annotations.json --chart-id 1 # Export after editing curl http://localhost:3000/api/span-annotations/export > services/ml/data/annotations/export.json # Train model docker-compose exec ml-service python pipeline.py --config config/pipeline.yaml # Restart inference docker-compose restart ml-service # View results open http://localhost:3000 open http://localhost:5000 # MLflow UI ```