Add complete workflow for using TA-Lib to bootstrap training data: - generate_talib_annotations.py: Python script to run TA-Lib CDL* functions and output span annotations in UI-compatible format - import_talib_annotations.ts: TypeScript script to import generated annotations into the UI database with auto-label-type creation - npm script 'import-annotations' for easy execution - TALIB_WORKFLOW.md: Comprehensive guide covering the full cycle: * Generate patterns with TA-Lib * Import into UI * Review and edit in browser * Export and train model * Compare predictions with TA-Lib detections * Iterate for improvement This enables the intended workflow: use TA-Lib for initial annotations, manually refine them, then train a model that learns from corrections.
10 KiB
TA-Lib Annotation Workflow
This guide shows how to use TA-Lib to generate initial pattern annotations, edit them in the UI, and train a model.
Overview
- Generate - Run TA-Lib CDL* functions to detect patterns automatically
- Import - Import detected patterns into the UI database
- Review & Edit - View, correct, and refine annotations in the web UI
- Train - Export annotations and train your model
- Iterate - Get predictions, compare with TA-Lib, retrain
Step 1: Generate TA-Lib Annotations
1.1 Prepare Your Data
Export candles from your database:
# From host
docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db -csv -header 'SELECT time, open, high, low, close, volume FROM candles ORDER BY time;'" > OHLCV.csv
# Or copy your existing CSV
cp your_data.csv OHLCV.csv
1.2 Run Pattern Detection
Enter the ML service container and run the generator:
# Enter container
docker-compose exec ml-service bash
# Generate annotations (all patterns, perfect matches only)
python generate_talib_annotations.py \
--input data/raw/OHLCV.csv \
--output talib_annotations.json
# Or specify patterns and lower confidence threshold
python generate_talib_annotations.py \
--input data/raw/OHLCV.csv \
--output talib_annotations.json \
--min-confidence 50 \
--patterns CDLENGULFING CDLHAMMER CDLDOJI
# Exit container
exit
This creates talib_annotations.json with detected patterns.
1.3 Review Detection Results
# Check what was detected
cat services/ml/talib_annotations.json | jq '.annotations | length'
cat services/ml/talib_annotations.json | jq '.annotations[0]'
# See pattern distribution
cat services/ml/talib_annotations.json | jq '[.annotations[].label] | group_by(.) | map({label: .[0], count: length}) | sort_by(.count) | reverse'
Output example:
{
"start_time": 1700000000,
"end_time": 1700003600,
"label": "Bullish Engulfing",
"confidence": 1.0,
"source": "programmatic",
"notes": "TA-Lib CDLENGULFING detection"
}
Step 2: Import into UI
2.1 Copy Annotations File
# Copy from ML service to project root
docker-compose cp ml-service:/app/talib_annotations.json ./talib_annotations.json
2.2 Get Your Chart ID
Open http://localhost:3000 and note your chart ID (shown in the chart selector, or check database):
docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT id, name FROM charts;'"
2.3 Import Annotations
# Import into chart 1
npm run import-annotations -- --file talib_annotations.json --chart-id 1
# Or clear existing annotations first
npm run import-annotations -- --file talib_annotations.json --chart-id 1 --clear
Output:
=== TA-Lib Annotation Import ===
Input file: talib_annotations.json
Chart ID: 1
Clear existing: no
Reading annotations file...
Found 147 annotations
Source: talib
Ensuring 12 label types exist...
✓ Bullish Engulfing (existing, id: 1)
+ Bearish Engulfing (created, id: 8, color: #ef4444)
✓ Bullish Hammer (existing, id: 2)
...
Importing 147 annotations for chart 1...
✓ Imported 147 annotations
=== Import Complete ===
Step 3: Review & Edit in UI
3.1 Open the Annotator
- Open http://localhost:3000
- Select your chart from the dropdown
- Scroll down to see the span annotations in the sidebar
3.2 Review TA-Lib Detections
You'll see all the TA-Lib detected patterns as span annotations:
- Green spans = Bullish patterns
- Red spans = Bearish patterns
- Source labeled as "programmatic"
3.3 Edit Annotations
Correct false positives:
- Click on a span annotation in the sidebar or chart
- Press Delete/Backspace to remove it
Add missing patterns:
- Select a span label type
- Click and drag on the chart to create new annotations
- Your annotations are marked as source "human"
Adjust boundaries:
- Delete the annotation
- Recreate it with correct start/end times
Add new pattern types:
- Go to "Manage Span Label Types"
- Add custom patterns TA-Lib doesn't detect
- Return to main page and annotate
3.4 Best Practices
- Review all TA-Lib detections - They're not always perfect
- Focus on quality over quantity - Better to have 50 accurate annotations than 200 noisy ones
- Add context - TA-Lib only detects classic patterns; add your own insights
- Diverse examples - Make sure you have patterns in different market conditions
Step 4: Export & Train
4.1 Export Annotations
# Export all span annotations (includes both human and TA-Lib)
curl http://localhost:3000/api/span-annotations/export > services/ml/data/annotations/export.json
# Verify export
cat services/ml/data/annotations/export.json | jq '.annotations | length'
4.2 Prepare OHLCV Data
# Copy candles to ML service
docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db -csv -header 'SELECT time, open, high, low, close, volume FROM candles ORDER BY time;'" > services/ml/data/raw/OHLCV.csv
4.3 Train Model
# Enter ML service
docker-compose exec ml-service bash
# Run full pipeline
python pipeline.py --config config/pipeline.yaml
# Exit
exit
4.4 Restart Inference
# Restart to load new model
docker-compose restart ml-service
# Verify model loaded
curl http://localhost:8001/model/info | jq '.model_info'
Step 5: Compare & Iterate
5.1 Get Predictions
- Open http://localhost:3000
- Scroll to Predictions panel
- Click "Run on Visible" or "Predict All"
5.2 Compare with TA-Lib
Now you can compare:
- Your edited annotations (human judgment)
- TA-Lib raw detections (programmatic)
- Model predictions (trained on your corrections)
The disagreement detection shows where these differ!
5.3 Iterate
Use the prediction summary to find:
- Missed by model - Patterns you annotated but model missed
- Missed by human - Model found patterns you didn't annotate
- Label mismatch - Same location, different pattern type
Add more annotations where the model struggles, then retrain.
Configuration Options
Pattern Selection
Edit which patterns to detect:
python generate_talib_annotations.py \
--input data/raw/OHLCV.csv \
--output talib_annotations.json \
--patterns CDLENGULFING CDLHAMMER CDLDOJI CDLMORNINGSTAR CDLEVENINGSTAR
Common patterns:
CDLENGULFING- Bullish/Bearish EngulfingCDLHAMMER- HammerCDLDOJI- DojiCDLMORNINGSTAR/CDLEVENINGSTAR- Morning/Evening StarCDLHARAMI- HaramiCDLTHREEWHITESOLDIERS/CDLTHREEBLACKCROWS- Three Soldiers/Crows
See full list: https://ta-lib.org/function.html (search for CDL)
Confidence Threshold
TA-Lib returns -100/+100 for pattern matches. Lower the threshold to get more detections:
# Get 50-100% matches (more patterns, potentially noisier)
python generate_talib_annotations.py \
--input data/raw/OHLCV.csv \
--output talib_annotations.json \
--min-confidence 50
Troubleshooting
"No module named 'talib'"
TA-Lib not installed:
# Rebuild ml-service with TA-Lib
docker-compose build --no-cache ml-service
docker-compose up -d ml-service
"No patterns detected"
Try:
- Lower confidence threshold - Use
--min-confidence 50 - Check data quality - Make sure OHLCV has valid data
- Try more patterns - Don't specify
--patterns, detect all
Import script fails
Make sure:
- File exists - Check path to JSON file
- Chart ID valid - Run:
docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT id, name FROM charts;'" - tsx installed - Run:
npm install
Annotations not showing in UI
- Refresh page - Hard refresh (Ctrl+F5)
- Check chart ID - Make sure you selected the correct chart
- Check database - Run:
docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT COUNT(*) FROM span_annotations;'"
Tips & Best Practices
Balancing Human & Programmatic Labels
When training, you can choose merge strategy in config/pipeline.yaml:
annotation_ingestion:
merge_strategy: "human_priority" # Use human labels where they overlap
# or "programmatic_priority" # Use TA-Lib where they overlap
# or "both" # Keep both as separate features
Recommended: Start with human_priority - trust your corrections over TA-Lib.
Iterative Improvement
- Round 1: Generate TA-Lib → Review → Train baseline model
- Round 2: Get predictions → Find disagreements → Add corrections → Retrain
- Round 3: Focus on low-confidence predictions → Add more examples → Retrain
- Repeat until model performance meets your needs
Pattern Coverage
Make sure you have examples of:
- Bullish patterns in uptrends
- Bearish patterns in downtrends
- Neutral patterns in sideways markets
- False signals (TA-Lib detected but actually not a tradeable pattern)
This teaches the model context, not just shape recognition.
Quality Metrics
Track these in MLflow UI (http://localhost:5000):
- Accuracy - Overall correctness
- F1 (macro) - Average across all pattern types
- Per-class F1 - Performance for each pattern individually
- Confusion matrix - Where the model makes mistakes
Focus on improving F1 for patterns you actually trade.
Quick Reference
# Generate TA-Lib annotations
docker-compose exec ml-service python generate_talib_annotations.py \
--input data/raw/OHLCV.csv --output talib_annotations.json
# Copy to host
docker-compose cp ml-service:/app/talib_annotations.json ./
# Import to UI
npm run import-annotations -- --file talib_annotations.json --chart-id 1
# Export after editing
curl http://localhost:3000/api/span-annotations/export > services/ml/data/annotations/export.json
# Train model
docker-compose exec ml-service python pipeline.py --config config/pipeline.yaml
# Restart inference
docker-compose restart ml-service
# View results
open http://localhost:3000
open http://localhost:5000 # MLflow UI