candle-annotator/TALIB_WORKFLOW.md

# TA-Lib Annotation Workflow

This guide shows how to use TA-Lib to generate initial pattern annotations, edit them in the UI, and train a model.

## Overview

1. **Generate** - Run TA-Lib CDL* functions to detect patterns automatically
2. **Import** - Import detected patterns into the UI database
3. **Review & Edit** - View, correct, and refine annotations in the web UI
4. **Train** - Export annotations and train your model
5. **Iterate** - Get predictions, compare with TA-Lib, retrain

## Step 1: Generate TA-Lib Annotations

### 1.1 Prepare Your Data

Export candles from your database:

```bash
# From host
docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db -csv -header 'SELECT time, open, high, low, close, volume FROM candles ORDER BY time;'" > OHLCV.csv

# Or copy your existing CSV
cp your_data.csv OHLCV.csv
```

### 1.2 Run Pattern Detection

Enter the ML service container and run the generator:

```bash
# Enter container
docker-compose exec ml-service bash

# Generate annotations (all patterns, perfect matches only)
python generate_talib_annotations.py \
  --input data/raw/OHLCV.csv \
  --output talib_annotations.json

# Or specify patterns and lower confidence threshold
python generate_talib_annotations.py \
  --input data/raw/OHLCV.csv \
  --output talib_annotations.json \
  --min-confidence 50 \
  --patterns CDLENGULFING CDLHAMMER CDLDOJI

# Exit container
exit
```

This creates `talib_annotations.json` with detected patterns.

### 1.3 Review Detection Results

```bash
# Check what was detected
cat services/ml/talib_annotations.json | jq '.annotations | length'
cat services/ml/talib_annotations.json | jq '.annotations[0]'

# See pattern distribution
cat services/ml/talib_annotations.json | jq '[.annotations[].label] | group_by(.) | map({label: .[0], count: length}) | sort_by(.count) | reverse'
```

**Output example:**
```json
{
  "start_time": 1700000000,
  "end_time": 1700003600,
  "label": "Bullish Engulfing",
  "confidence": 1.0,
  "source": "programmatic",
  "notes": "TA-Lib CDLENGULFING detection"
}
```

## Step 2: Import into UI

### 2.1 Copy Annotations File

```bash
# Copy from ML service to project root
docker-compose cp ml-service:/app/talib_annotations.json ./talib_annotations.json
```

### 2.2 Get Your Chart ID

Open http://localhost:3000 and note your chart ID (shown in the chart selector, or check database):

```bash
docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT id, name FROM charts;'"
```

### 2.3 Import Annotations

```bash
# Import into chart 1
npm run import-annotations -- --file talib_annotations.json --chart-id 1

# Or clear existing annotations first
npm run import-annotations -- --file talib_annotations.json --chart-id 1 --clear
```

**Output:**
```
=== TA-Lib Annotation Import ===

Input file: talib_annotations.json
Chart ID: 1
Clear existing: no

Reading annotations file...
Found 147 annotations
Source: talib

Ensuring 12 label types exist...
  ✓ Bullish Engulfing (existing, id: 1)
  + Bearish Engulfing (created, id: 8, color: #ef4444)
  ✓ Bullish Hammer (existing, id: 2)
  ...

Importing 147 annotations for chart 1...

✓ Imported 147 annotations

=== Import Complete ===
```

## Step 3: Review & Edit in UI

### 3.1 Open the Annotator

1. Open http://localhost:3000
2. Select your chart from the dropdown
3. Scroll down to see the span annotations in the sidebar

### 3.2 Review TA-Lib Detections

You'll see all the TA-Lib detected patterns as span annotations:
- Green spans = Bullish patterns
- Red spans = Bearish patterns
- Source labeled as "programmatic"

### 3.3 Edit Annotations

**Correct false positives:**
- Click on a span annotation in the sidebar or chart
- Press Delete/Backspace to remove it

**Add missing patterns:**
- Select a span label type
- Click and drag on the chart to create new annotations
- Your annotations are marked as source "human"

**Adjust boundaries:**
- Delete the annotation
- Recreate it with correct start/end times

**Add new pattern types:**
- Go to "Manage Span Label Types"
- Add custom patterns TA-Lib doesn't detect
- Return to main page and annotate

### 3.4 Best Practices

- **Review all TA-Lib detections** - They're not always perfect
- **Focus on quality over quantity** - Better to have 50 accurate annotations than 200 noisy ones
- **Add context** - TA-Lib only detects classic patterns; add your own insights
- **Diverse examples** - Make sure you have patterns in different market conditions

## Step 4: Export & Train

### 4.1 Export Annotations

```bash
# Export all span annotations (includes both human and TA-Lib)
curl http://localhost:3000/api/span-annotations/export > services/ml/data/annotations/export.json

# Verify export
cat services/ml/data/annotations/export.json | jq '.annotations | length'
```

### 4.2 Prepare OHLCV Data

```bash
# Copy candles to ML service
docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db -csv -header 'SELECT time, open, high, low, close, volume FROM candles ORDER BY time;'" > services/ml/data/raw/OHLCV.csv
```

### 4.3 Train Model

```bash
# Enter ML service
docker-compose exec ml-service bash

# Run full pipeline
python pipeline.py --config config/pipeline.yaml

# Exit
exit
```

### 4.4 Restart Inference

```bash
# Restart to load new model
docker-compose restart ml-service

# Verify model loaded
curl http://localhost:8001/model/info | jq '.model_info'
```

## Step 5: Compare & Iterate

### 5.1 Get Predictions

1. Open http://localhost:3000
2. Scroll to Predictions panel
3. Click "Run on Visible" or "Predict All"

### 5.2 Compare with TA-Lib

Now you can compare:
- **Your edited annotations** (human judgment)
- **TA-Lib raw detections** (programmatic)
- **Model predictions** (trained on your corrections)

The disagreement detection shows where these differ!

### 5.3 Iterate

Use the prediction summary to find:
- **Missed by model** - Patterns you annotated but model missed
- **Missed by human** - Model found patterns you didn't annotate
- **Label mismatch** - Same location, different pattern type

Add more annotations where the model struggles, then retrain.

## Configuration Options

### Pattern Selection

Edit which patterns to detect:

```bash
python generate_talib_annotations.py \
  --input data/raw/OHLCV.csv \
  --output talib_annotations.json \
  --patterns CDLENGULFING CDLHAMMER CDLDOJI CDLMORNINGSTAR CDLEVENINGSTAR
```

Common patterns:
- `CDLENGULFING` - Bullish/Bearish Engulfing
- `CDLHAMMER` - Hammer
- `CDLDOJI` - Doji
- `CDLMORNINGSTAR` / `CDLEVENINGSTAR` - Morning/Evening Star
- `CDLHARAMI` - Harami
- `CDLTHREEWHITESOLDIERS` / `CDLTHREEBLACKCROWS` - Three Soldiers/Crows

See full list: https://ta-lib.org/function.html (search for CDL)

### Confidence Threshold

TA-Lib returns -100/+100 for pattern matches. Lower the threshold to get more detections:

```bash
# Get 50-100% matches (more patterns, potentially noisier)
python generate_talib_annotations.py \
  --input data/raw/OHLCV.csv \
  --output talib_annotations.json \
  --min-confidence 50
```

## Troubleshooting

### "No module named 'talib'"

TA-Lib not installed:

```bash
# Rebuild ml-service with TA-Lib
docker-compose build --no-cache ml-service
docker-compose up -d ml-service
```

### "No patterns detected"

Try:
1. **Lower confidence threshold** - Use `--min-confidence 50`
2. **Check data quality** - Make sure OHLCV has valid data
3. **Try more patterns** - Don't specify `--patterns`, detect all

### Import script fails

Make sure:
1. **File exists** - Check path to JSON file
2. **Chart ID valid** - Run: `docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT id, name FROM charts;'"`
3. **tsx installed** - Run: `npm install`

### Annotations not showing in UI

1. **Refresh page** - Hard refresh (Ctrl+F5)
2. **Check chart ID** - Make sure you selected the correct chart
3. **Check database** - Run: `docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT COUNT(*) FROM span_annotations;'"`

## Tips & Best Practices

### Balancing Human & Programmatic Labels

When training, you can choose merge strategy in `config/pipeline.yaml`:

```yaml
annotation_ingestion:
  merge_strategy: "human_priority"  # Use human labels where they overlap
  # or "programmatic_priority"      # Use TA-Lib where they overlap
  # or "both"                       # Keep both as separate features
```

**Recommended**: Start with `human_priority` - trust your corrections over TA-Lib.

### Iterative Improvement

1. **Round 1**: Generate TA-Lib → Review → Train baseline model
2. **Round 2**: Get predictions → Find disagreements → Add corrections → Retrain
3. **Round 3**: Focus on low-confidence predictions → Add more examples → Retrain
4. **Repeat** until model performance meets your needs

### Pattern Coverage

Make sure you have examples of:
- **Bullish patterns** in uptrends
- **Bearish patterns** in downtrends
- **Neutral patterns** in sideways markets
- **False signals** (TA-Lib detected but actually not a tradeable pattern)

This teaches the model context, not just shape recognition.

### Quality Metrics

Track these in MLflow UI (http://localhost:5000):
- **Accuracy** - Overall correctness
- **F1 (macro)** - Average across all pattern types
- **Per-class F1** - Performance for each pattern individually
- **Confusion matrix** - Where the model makes mistakes

Focus on improving F1 for patterns you actually trade.

## Quick Reference

```bash
# Generate TA-Lib annotations
docker-compose exec ml-service python generate_talib_annotations.py \
  --input data/raw/OHLCV.csv --output talib_annotations.json

# Copy to host
docker-compose cp ml-service:/app/talib_annotations.json ./

# Import to UI
npm run import-annotations -- --file talib_annotations.json --chart-id 1

# Export after editing
curl http://localhost:3000/api/span-annotations/export > services/ml/data/annotations/export.json

# Train model
docker-compose exec ml-service python pipeline.py --config config/pipeline.yaml

# Restart inference
docker-compose restart ml-service

# View results
open http://localhost:3000
open http://localhost:5000  # MLflow UI
```