feat(ml): add TA-Lib annotation generation and import workflow
Add complete workflow for using TA-Lib to bootstrap training data: - generate_talib_annotations.py: Python script to run TA-Lib CDL* functions and output span annotations in UI-compatible format - import_talib_annotations.ts: TypeScript script to import generated annotations into the UI database with auto-label-type creation - npm script 'import-annotations' for easy execution - TALIB_WORKFLOW.md: Comprehensive guide covering the full cycle: * Generate patterns with TA-Lib * Import into UI * Review and edit in browser * Export and train model * Compare predictions with TA-Lib detections * Iterate for improvement This enables the intended workflow: use TA-Lib for initial annotations, manually refine them, then train a model that learns from corrections.
This commit is contained in:
parent
228f70daf3
commit
847ff67986
18 changed files with 5416 additions and 7 deletions
372
TALIB_WORKFLOW.md
Normal file
372
TALIB_WORKFLOW.md
Normal file
|
|
@ -0,0 +1,372 @@
|
|||
# TA-Lib Annotation Workflow
|
||||
|
||||
This guide shows how to use TA-Lib to generate initial pattern annotations, edit them in the UI, and train a model.
|
||||
|
||||
## Overview
|
||||
|
||||
1. **Generate** - Run TA-Lib CDL* functions to detect patterns automatically
|
||||
2. **Import** - Import detected patterns into the UI database
|
||||
3. **Review & Edit** - View, correct, and refine annotations in the web UI
|
||||
4. **Train** - Export annotations and train your model
|
||||
5. **Iterate** - Get predictions, compare with TA-Lib, retrain
|
||||
|
||||
## Step 1: Generate TA-Lib Annotations
|
||||
|
||||
### 1.1 Prepare Your Data
|
||||
|
||||
Export candles from your database:
|
||||
|
||||
```bash
|
||||
# From host
|
||||
docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db -csv -header 'SELECT time, open, high, low, close, volume FROM candles ORDER BY time;'" > OHLCV.csv
|
||||
|
||||
# Or copy your existing CSV
|
||||
cp your_data.csv OHLCV.csv
|
||||
```
|
||||
|
||||
### 1.2 Run Pattern Detection
|
||||
|
||||
Enter the ML service container and run the generator:
|
||||
|
||||
```bash
|
||||
# Enter container
|
||||
docker-compose exec ml-service bash
|
||||
|
||||
# Generate annotations (all patterns, perfect matches only)
|
||||
python generate_talib_annotations.py \
|
||||
--input data/raw/OHLCV.csv \
|
||||
--output talib_annotations.json
|
||||
|
||||
# Or specify patterns and lower confidence threshold
|
||||
python generate_talib_annotations.py \
|
||||
--input data/raw/OHLCV.csv \
|
||||
--output talib_annotations.json \
|
||||
--min-confidence 50 \
|
||||
--patterns CDLENGULFING CDLHAMMER CDLDOJI
|
||||
|
||||
# Exit container
|
||||
exit
|
||||
```
|
||||
|
||||
This creates `talib_annotations.json` with detected patterns.
|
||||
|
||||
### 1.3 Review Detection Results
|
||||
|
||||
```bash
|
||||
# Check what was detected
|
||||
cat services/ml/talib_annotations.json | jq '.annotations | length'
|
||||
cat services/ml/talib_annotations.json | jq '.annotations[0]'
|
||||
|
||||
# See pattern distribution
|
||||
cat services/ml/talib_annotations.json | jq '[.annotations[].label] | group_by(.) | map({label: .[0], count: length}) | sort_by(.count) | reverse'
|
||||
```
|
||||
|
||||
**Output example:**
|
||||
```json
|
||||
{
|
||||
"start_time": 1700000000,
|
||||
"end_time": 1700003600,
|
||||
"label": "Bullish Engulfing",
|
||||
"confidence": 1.0,
|
||||
"source": "programmatic",
|
||||
"notes": "TA-Lib CDLENGULFING detection"
|
||||
}
|
||||
```
|
||||
|
||||
## Step 2: Import into UI
|
||||
|
||||
### 2.1 Copy Annotations File
|
||||
|
||||
```bash
|
||||
# Copy from ML service to project root
|
||||
docker-compose cp ml-service:/app/talib_annotations.json ./talib_annotations.json
|
||||
```
|
||||
|
||||
### 2.2 Get Your Chart ID
|
||||
|
||||
Open http://localhost:3000 and note your chart ID (shown in the chart selector, or check database):
|
||||
|
||||
```bash
|
||||
docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT id, name FROM charts;'"
|
||||
```
|
||||
|
||||
### 2.3 Import Annotations
|
||||
|
||||
```bash
|
||||
# Import into chart 1
|
||||
npm run import-annotations -- --file talib_annotations.json --chart-id 1
|
||||
|
||||
# Or clear existing annotations first
|
||||
npm run import-annotations -- --file talib_annotations.json --chart-id 1 --clear
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
=== TA-Lib Annotation Import ===
|
||||
|
||||
Input file: talib_annotations.json
|
||||
Chart ID: 1
|
||||
Clear existing: no
|
||||
|
||||
Reading annotations file...
|
||||
Found 147 annotations
|
||||
Source: talib
|
||||
|
||||
Ensuring 12 label types exist...
|
||||
✓ Bullish Engulfing (existing, id: 1)
|
||||
+ Bearish Engulfing (created, id: 8, color: #ef4444)
|
||||
✓ Bullish Hammer (existing, id: 2)
|
||||
...
|
||||
|
||||
Importing 147 annotations for chart 1...
|
||||
|
||||
✓ Imported 147 annotations
|
||||
|
||||
=== Import Complete ===
|
||||
```
|
||||
|
||||
## Step 3: Review & Edit in UI
|
||||
|
||||
### 3.1 Open the Annotator
|
||||
|
||||
1. Open http://localhost:3000
|
||||
2. Select your chart from the dropdown
|
||||
3. Scroll down to see the span annotations in the sidebar
|
||||
|
||||
### 3.2 Review TA-Lib Detections
|
||||
|
||||
You'll see all the TA-Lib detected patterns as span annotations:
|
||||
- Green spans = Bullish patterns
|
||||
- Red spans = Bearish patterns
|
||||
- Source labeled as "programmatic"
|
||||
|
||||
### 3.3 Edit Annotations
|
||||
|
||||
**Correct false positives:**
|
||||
- Click on a span annotation in the sidebar or chart
|
||||
- Press Delete/Backspace to remove it
|
||||
|
||||
**Add missing patterns:**
|
||||
- Select a span label type
|
||||
- Click and drag on the chart to create new annotations
|
||||
- Your annotations are marked as source "human"
|
||||
|
||||
**Adjust boundaries:**
|
||||
- Delete the annotation
|
||||
- Recreate it with correct start/end times
|
||||
|
||||
**Add new pattern types:**
|
||||
- Go to "Manage Span Label Types"
|
||||
- Add custom patterns TA-Lib doesn't detect
|
||||
- Return to main page and annotate
|
||||
|
||||
### 3.4 Best Practices
|
||||
|
||||
- **Review all TA-Lib detections** - They're not always perfect
|
||||
- **Focus on quality over quantity** - Better to have 50 accurate annotations than 200 noisy ones
|
||||
- **Add context** - TA-Lib only detects classic patterns; add your own insights
|
||||
- **Diverse examples** - Make sure you have patterns in different market conditions
|
||||
|
||||
## Step 4: Export & Train
|
||||
|
||||
### 4.1 Export Annotations
|
||||
|
||||
```bash
|
||||
# Export all span annotations (includes both human and TA-Lib)
|
||||
curl http://localhost:3000/api/span-annotations/export > services/ml/data/annotations/export.json
|
||||
|
||||
# Verify export
|
||||
cat services/ml/data/annotations/export.json | jq '.annotations | length'
|
||||
```
|
||||
|
||||
### 4.2 Prepare OHLCV Data
|
||||
|
||||
```bash
|
||||
# Copy candles to ML service
|
||||
docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db -csv -header 'SELECT time, open, high, low, close, volume FROM candles ORDER BY time;'" > services/ml/data/raw/OHLCV.csv
|
||||
```
|
||||
|
||||
### 4.3 Train Model
|
||||
|
||||
```bash
|
||||
# Enter ML service
|
||||
docker-compose exec ml-service bash
|
||||
|
||||
# Run full pipeline
|
||||
python pipeline.py --config config/pipeline.yaml
|
||||
|
||||
# Exit
|
||||
exit
|
||||
```
|
||||
|
||||
### 4.4 Restart Inference
|
||||
|
||||
```bash
|
||||
# Restart to load new model
|
||||
docker-compose restart ml-service
|
||||
|
||||
# Verify model loaded
|
||||
curl http://localhost:8001/model/info | jq '.model_info'
|
||||
```
|
||||
|
||||
## Step 5: Compare & Iterate
|
||||
|
||||
### 5.1 Get Predictions
|
||||
|
||||
1. Open http://localhost:3000
|
||||
2. Scroll to Predictions panel
|
||||
3. Click "Run on Visible" or "Predict All"
|
||||
|
||||
### 5.2 Compare with TA-Lib
|
||||
|
||||
Now you can compare:
|
||||
- **Your edited annotations** (human judgment)
|
||||
- **TA-Lib raw detections** (programmatic)
|
||||
- **Model predictions** (trained on your corrections)
|
||||
|
||||
The disagreement detection shows where these differ!
|
||||
|
||||
### 5.3 Iterate
|
||||
|
||||
Use the prediction summary to find:
|
||||
- **Missed by model** - Patterns you annotated but model missed
|
||||
- **Missed by human** - Model found patterns you didn't annotate
|
||||
- **Label mismatch** - Same location, different pattern type
|
||||
|
||||
Add more annotations where the model struggles, then retrain.
|
||||
|
||||
## Configuration Options
|
||||
|
||||
### Pattern Selection
|
||||
|
||||
Edit which patterns to detect:
|
||||
|
||||
```bash
|
||||
python generate_talib_annotations.py \
|
||||
--input data/raw/OHLCV.csv \
|
||||
--output talib_annotations.json \
|
||||
--patterns CDLENGULFING CDLHAMMER CDLDOJI CDLMORNINGSTAR CDLEVENINGSTAR
|
||||
```
|
||||
|
||||
Common patterns:
|
||||
- `CDLENGULFING` - Bullish/Bearish Engulfing
|
||||
- `CDLHAMMER` - Hammer
|
||||
- `CDLDOJI` - Doji
|
||||
- `CDLMORNINGSTAR` / `CDLEVENINGSTAR` - Morning/Evening Star
|
||||
- `CDLHARAMI` - Harami
|
||||
- `CDLTHREEWHITESOLDIERS` / `CDLTHREEBLACKCROWS` - Three Soldiers/Crows
|
||||
|
||||
See full list: https://ta-lib.org/function.html (search for CDL)
|
||||
|
||||
### Confidence Threshold
|
||||
|
||||
TA-Lib returns -100/+100 for pattern matches. Lower the threshold to get more detections:
|
||||
|
||||
```bash
|
||||
# Get 50-100% matches (more patterns, potentially noisier)
|
||||
python generate_talib_annotations.py \
|
||||
--input data/raw/OHLCV.csv \
|
||||
--output talib_annotations.json \
|
||||
--min-confidence 50
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "No module named 'talib'"
|
||||
|
||||
TA-Lib not installed:
|
||||
|
||||
```bash
|
||||
# Rebuild ml-service with TA-Lib
|
||||
docker-compose build --no-cache ml-service
|
||||
docker-compose up -d ml-service
|
||||
```
|
||||
|
||||
### "No patterns detected"
|
||||
|
||||
Try:
|
||||
1. **Lower confidence threshold** - Use `--min-confidence 50`
|
||||
2. **Check data quality** - Make sure OHLCV has valid data
|
||||
3. **Try more patterns** - Don't specify `--patterns`, detect all
|
||||
|
||||
### Import script fails
|
||||
|
||||
Make sure:
|
||||
1. **File exists** - Check path to JSON file
|
||||
2. **Chart ID valid** - Run: `docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT id, name FROM charts;'"`
|
||||
3. **tsx installed** - Run: `npm install`
|
||||
|
||||
### Annotations not showing in UI
|
||||
|
||||
1. **Refresh page** - Hard refresh (Ctrl+F5)
|
||||
2. **Check chart ID** - Make sure you selected the correct chart
|
||||
3. **Check database** - Run: `docker-compose exec candle-annotator sh -c "sqlite3 /app/data/candles.db 'SELECT COUNT(*) FROM span_annotations;'"`
|
||||
|
||||
## Tips & Best Practices
|
||||
|
||||
### Balancing Human & Programmatic Labels
|
||||
|
||||
When training, you can choose merge strategy in `config/pipeline.yaml`:
|
||||
|
||||
```yaml
|
||||
annotation_ingestion:
|
||||
merge_strategy: "human_priority" # Use human labels where they overlap
|
||||
# or "programmatic_priority" # Use TA-Lib where they overlap
|
||||
# or "both" # Keep both as separate features
|
||||
```
|
||||
|
||||
**Recommended**: Start with `human_priority` - trust your corrections over TA-Lib.
|
||||
|
||||
### Iterative Improvement
|
||||
|
||||
1. **Round 1**: Generate TA-Lib → Review → Train baseline model
|
||||
2. **Round 2**: Get predictions → Find disagreements → Add corrections → Retrain
|
||||
3. **Round 3**: Focus on low-confidence predictions → Add more examples → Retrain
|
||||
4. **Repeat** until model performance meets your needs
|
||||
|
||||
### Pattern Coverage
|
||||
|
||||
Make sure you have examples of:
|
||||
- **Bullish patterns** in uptrends
|
||||
- **Bearish patterns** in downtrends
|
||||
- **Neutral patterns** in sideways markets
|
||||
- **False signals** (TA-Lib detected but actually not a tradeable pattern)
|
||||
|
||||
This teaches the model context, not just shape recognition.
|
||||
|
||||
### Quality Metrics
|
||||
|
||||
Track these in MLflow UI (http://localhost:5000):
|
||||
- **Accuracy** - Overall correctness
|
||||
- **F1 (macro)** - Average across all pattern types
|
||||
- **Per-class F1** - Performance for each pattern individually
|
||||
- **Confusion matrix** - Where the model makes mistakes
|
||||
|
||||
Focus on improving F1 for patterns you actually trade.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
```bash
|
||||
# Generate TA-Lib annotations
|
||||
docker-compose exec ml-service python generate_talib_annotations.py \
|
||||
--input data/raw/OHLCV.csv --output talib_annotations.json
|
||||
|
||||
# Copy to host
|
||||
docker-compose cp ml-service:/app/talib_annotations.json ./
|
||||
|
||||
# Import to UI
|
||||
npm run import-annotations -- --file talib_annotations.json --chart-id 1
|
||||
|
||||
# Export after editing
|
||||
curl http://localhost:3000/api/span-annotations/export > services/ml/data/annotations/export.json
|
||||
|
||||
# Train model
|
||||
docker-compose exec ml-service python pipeline.py --config config/pipeline.yaml
|
||||
|
||||
# Restart inference
|
||||
docker-compose restart ml-service
|
||||
|
||||
# View results
|
||||
open http://localhost:3000
|
||||
open http://localhost:5000 # MLflow UI
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue