Marko Djordjevic 847ff67986 feat(ml): add TA-Lib annotation generation and import workflow

Add complete workflow for using TA-Lib to bootstrap training data:

- generate_talib_annotations.py: Python script to run TA-Lib CDL* functions
  and output span annotations in UI-compatible format
- import_talib_annotations.ts: TypeScript script to import generated
  annotations into the UI database with auto-label-type creation
- npm script 'import-annotations' for easy execution
- TALIB_WORKFLOW.md: Comprehensive guide covering the full cycle:
  * Generate patterns with TA-Lib
  * Import into UI
  * Review and edit in browser
  * Export and train model
  * Compare predictions with TA-Lib detections
  * Iterate for improvement

This enables the intended workflow: use TA-Lib for initial annotations,
manually refine them, then train a model that learns from corrections.

2026-02-15 19:18:28 +01:00

10 KiB

Raw Blame History

TA-Lib Annotation Workflow

This guide shows how to use TA-Lib to generate initial pattern annotations, edit them in the UI, and train a model.

Overview

Generate - Run TA-Lib CDL* functions to detect patterns automatically
Import - Import detected patterns into the UI database
Review & Edit - View, correct, and refine annotations in the web UI
Train - Export annotations and train your model
Iterate - Get predictions, compare with TA-Lib, retrain

Step 1: Generate TA-Lib Annotations

1.1 Prepare Your Data

Export candles from your database:

# From host
docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db -csv -header 'SELECT time, open, high, low, close, volume FROM candles ORDER BY time;'" > OHLCV.csv

# Or copy your existing CSV
cp your_data.csv OHLCV.csv

1.2 Run Pattern Detection

Enter the ML service container and run the generator:

# Enter container
docker-compose exec ml-service bash

# Generate annotations (all patterns, perfect matches only)
python generate_talib_annotations.py \
  --input data/raw/OHLCV.csv \
  --output talib_annotations.json

# Or specify patterns and lower confidence threshold
python generate_talib_annotations.py \
  --input data/raw/OHLCV.csv \
  --output talib_annotations.json \
  --min-confidence 50 \
  --patterns CDLENGULFING CDLHAMMER CDLDOJI

# Exit container
exit

This creates talib_annotations.json with detected patterns.

1.3 Review Detection Results

# Check what was detected
cat services/ml/talib_annotations.json | jq '.annotations | length'
cat services/ml/talib_annotations.json | jq '.annotations[0]'

# See pattern distribution
cat services/ml/talib_annotations.json | jq '[.annotations[].label] | group_by(.) | map({label: .[0], count: length}) | sort_by(.count) | reverse'

Output example:

{
  "start_time": 1700000000,
  "end_time": 1700003600,
  "label": "Bullish Engulfing",
  "confidence": 1.0,
  "source": "programmatic",
  "notes": "TA-Lib CDLENGULFING detection"
}

Step 2: Import into UI

2.1 Copy Annotations File

# Copy from ML service to project root
docker-compose cp ml-service:/app/talib_annotations.json ./talib_annotations.json

2.2 Get Your Chart ID

Open http://localhost:3000 and note your chart ID (shown in the chart selector, or check database):

docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT id, name FROM charts;'"

2.3 Import Annotations

# Import into chart 1
npm run import-annotations -- --file talib_annotations.json --chart-id 1

# Or clear existing annotations first
npm run import-annotations -- --file talib_annotations.json --chart-id 1 --clear

Output:

=== TA-Lib Annotation Import ===

Input file: talib_annotations.json
Chart ID: 1
Clear existing: no

Reading annotations file...
Found 147 annotations
Source: talib

Ensuring 12 label types exist...
  ✓ Bullish Engulfing (existing, id: 1)
  + Bearish Engulfing (created, id: 8, color: #ef4444)
  ✓ Bullish Hammer (existing, id: 2)
  ...

Importing 147 annotations for chart 1...

✓ Imported 147 annotations

=== Import Complete ===

Step 3: Review & Edit in UI

3.1 Open the Annotator

Open http://localhost:3000
Select your chart from the dropdown
Scroll down to see the span annotations in the sidebar

3.2 Review TA-Lib Detections

You'll see all the TA-Lib detected patterns as span annotations:

Green spans = Bullish patterns
Red spans = Bearish patterns
Source labeled as "programmatic"

3.3 Edit Annotations

Correct false positives:

Click on a span annotation in the sidebar or chart
Press Delete/Backspace to remove it

Add missing patterns:

Select a span label type
Click and drag on the chart to create new annotations
Your annotations are marked as source "human"

Adjust boundaries:

Delete the annotation
Recreate it with correct start/end times

Add new pattern types:

Go to "Manage Span Label Types"
Add custom patterns TA-Lib doesn't detect
Return to main page and annotate

3.4 Best Practices

Review all TA-Lib detections - They're not always perfect
Focus on quality over quantity - Better to have 50 accurate annotations than 200 noisy ones
Add context - TA-Lib only detects classic patterns; add your own insights
Diverse examples - Make sure you have patterns in different market conditions

Step 4: Export & Train

4.1 Export Annotations

# Export all span annotations (includes both human and TA-Lib)
curl http://localhost:3000/api/span-annotations/export > services/ml/data/annotations/export.json

# Verify export
cat services/ml/data/annotations/export.json | jq '.annotations | length'

4.2 Prepare OHLCV Data

# Copy candles to ML service
docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db -csv -header 'SELECT time, open, high, low, close, volume FROM candles ORDER BY time;'" > services/ml/data/raw/OHLCV.csv

4.3 Train Model

# Enter ML service
docker-compose exec ml-service bash

# Run full pipeline
python pipeline.py --config config/pipeline.yaml

# Exit
exit

4.4 Restart Inference

# Restart to load new model
docker-compose restart ml-service

# Verify model loaded
curl http://localhost:8001/model/info | jq '.model_info'

Step 5: Compare & Iterate

5.1 Get Predictions

Open http://localhost:3000
Scroll to Predictions panel
Click "Run on Visible" or "Predict All"

5.2 Compare with TA-Lib

Now you can compare:

Your edited annotations (human judgment)
TA-Lib raw detections (programmatic)
Model predictions (trained on your corrections)

The disagreement detection shows where these differ!

5.3 Iterate

Use the prediction summary to find:

Missed by model - Patterns you annotated but model missed
Missed by human - Model found patterns you didn't annotate
Label mismatch - Same location, different pattern type

Add more annotations where the model struggles, then retrain.

Configuration Options

Pattern Selection

Edit which patterns to detect:

python generate_talib_annotations.py \
  --input data/raw/OHLCV.csv \
  --output talib_annotations.json \
  --patterns CDLENGULFING CDLHAMMER CDLDOJI CDLMORNINGSTAR CDLEVENINGSTAR

Common patterns:

CDLENGULFING - Bullish/Bearish Engulfing
CDLHAMMER - Hammer
CDLDOJI - Doji
CDLMORNINGSTAR / CDLEVENINGSTAR - Morning/Evening Star
CDLHARAMI - Harami
CDLTHREEWHITESOLDIERS / CDLTHREEBLACKCROWS - Three Soldiers/Crows

See full list: https://ta-lib.org/function.html (search for CDL)

Confidence Threshold

TA-Lib returns -100/+100 for pattern matches. Lower the threshold to get more detections:

# Get 50-100% matches (more patterns, potentially noisier)
python generate_talib_annotations.py \
  --input data/raw/OHLCV.csv \
  --output talib_annotations.json \
  --min-confidence 50

Troubleshooting

"No module named 'talib'"

TA-Lib not installed:

# Rebuild ml-service with TA-Lib
docker-compose build --no-cache ml-service
docker-compose up -d ml-service

"No patterns detected"

Try:

Lower confidence threshold - Use --min-confidence 50
Check data quality - Make sure OHLCV has valid data
Try more patterns - Don't specify --patterns, detect all

Import script fails

Make sure:

File exists - Check path to JSON file
Chart ID valid - Run: docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT id, name FROM charts;'"
tsx installed - Run: npm install

Annotations not showing in UI

Refresh page - Hard refresh (Ctrl+F5)
Check chart ID - Make sure you selected the correct chart
Check database - Run: docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT COUNT(*) FROM span_annotations;'"

Tips & Best Practices

Balancing Human & Programmatic Labels

When training, you can choose merge strategy in config/pipeline.yaml:

annotation_ingestion:
  merge_strategy: "human_priority"  # Use human labels where they overlap
  # or "programmatic_priority"      # Use TA-Lib where they overlap
  # or "both"                       # Keep both as separate features

Recommended: Start with human_priority - trust your corrections over TA-Lib.

Iterative Improvement

Round 1: Generate TA-Lib → Review → Train baseline model
Round 2: Get predictions → Find disagreements → Add corrections → Retrain
Round 3: Focus on low-confidence predictions → Add more examples → Retrain
Repeat until model performance meets your needs

Pattern Coverage

Make sure you have examples of:

Bullish patterns in uptrends
Bearish patterns in downtrends
Neutral patterns in sideways markets
False signals (TA-Lib detected but actually not a tradeable pattern)

This teaches the model context, not just shape recognition.

Quality Metrics

Track these in MLflow UI (http://localhost:5000):

Accuracy - Overall correctness
F1 (macro) - Average across all pattern types
Per-class F1 - Performance for each pattern individually
Confusion matrix - Where the model makes mistakes

Focus on improving F1 for patterns you actually trade.

Quick Reference

# Generate TA-Lib annotations
docker-compose exec ml-service python generate_talib_annotations.py \
  --input data/raw/OHLCV.csv --output talib_annotations.json

# Copy to host
docker-compose cp ml-service:/app/talib_annotations.json ./

# Import to UI
npm run import-annotations -- --file talib_annotations.json --chart-id 1

# Export after editing
curl http://localhost:3000/api/span-annotations/export > services/ml/data/annotations/export.json

# Train model
docker-compose exec ml-service python pipeline.py --config config/pipeline.yaml

# Restart inference
docker-compose restart ml-service

# View results
open http://localhost:3000
open http://localhost:5000  # MLflow UI

10 KiB Raw Blame History