candle-annotator/openspec/changes/candle-backend/tasks.md

## 1. Project Scaffolding & Infrastructure

- [x] 1.1 Create `services/ml/` directory structure: `config/`, `features/`, `features/custom/`, `training/`, `training/models/`, `inference/`, `data/raw/`, `data/enriched/`, `data/labeled/`, `data/annotations/`
- [x] 1.2 Create `services/ml/pyproject.toml` (or `requirements.txt`) with dependencies: fastapi, uvicorn, scikit-learn, xgboost, pandas, numpy, joblib, mlflow, pyyaml, ta-lib, dvc, sqlalchemy, psycopg2-binary, pydantic
- [x] 1.3 Create `services/ml/Dockerfile` with Python 3.11, TA-Lib C library installation (`libta-lib-dev`), and pip install of dependencies
- [x] 1.4 Create `config/pipeline.yaml` with the full pipeline configuration (all stages, default hyperparameters, MLflow/DVC settings)
- [x] 1.5 Add PostgreSQL, ml-service, and mlflow containers to `docker-compose.yml` with shared data volume
- [ ] 1.6 Initialize DVC in `services/ml/` with local remote storage backend
- [ ] 1.7 Create PostgreSQL database schema: `training_runs` table (run_id, model_type, experiment_name, pipeline_config_hash, dataset_version, metrics_summary JSON, status, created_at, completed_at)
- [ ] 1.8 Create `services/ml/app/db.py` — SQLAlchemy engine and session setup for PostgreSQL connection

## 2. Pipeline Config & Entry Point

- [ ] 2.1 Create `services/ml/app/config.py` — Pydantic model for pipeline YAML config with validation (stages, data paths, hyperparameters)
- [ ] 2.2 Create `services/ml/pipeline.py` — main orchestrator that reads config and runs enabled stages in sequence
- [ ] 2.3 Add CLI argument parsing: `--config`, `--stage` (run individual stage), support for `python pipeline.py --config config/pipeline.yaml`

## 3. Feature Engineering Stage

- [ ] 3.1 Create `services/ml/features/talib_features.py` — compute TA-Lib indicators from config list, append columns with `{indicator}_{param}` naming, fail with clear error if TA-Lib not installed
- [ ] 3.2 Create `services/ml/features/candle_features.py` — compute body_size, body_direction, upper_wick, lower_wick, wick_ratio, body_to_range, gap, range with division-by-zero handling
- [ ] 3.3 Create `services/ml/features/custom_loader.py` — dynamic import of custom feature functions from config paths, call with DataFrame, append result as column
- [ ] 3.4 Implement NaN warmup row handling — drop rows with NaN in indicator columns, log count of dropped rows
- [ ] 3.5 Wire feature engineering into `pipeline.py` — read raw OHLCV CSV, run enabled feature steps, write enriched CSV to `data.enriched_path`

## 4. Annotation Ingestion Stage

- [ ] 4.1 Create `services/ml/app/annotation_ingestion.py` — load annotations JSON from `data.annotations_path`, filter by min_confidence
- [ ] 4.2 Implement windowed classification encoding — extract fixed-size windows centered on each annotation span, flatten into single rows, handle boundary padding
- [ ] 4.3 Implement BIO sequence labeling encoding — assign B-{label}/I-{label}/O tags per candle, handle overlapping annotations with multiple tag columns
- [ ] 4.4 Implement TA-Lib CDL* programmatic labeling — run configured CDL functions, convert +100/-100 to label names (bullish_/bearish_ prefix)
- [ ] 4.5 Implement human/programmatic label merge strategies — human_priority, programmatic_priority, both (separate columns)
- [ ] 4.6 Implement context padding — include N candles before/after each annotation span
- [ ] 4.7 Add dataset statistics logging — counts per label, class distribution %, avg span length, human/programmatic agreement rate
- [ ] 4.8 Wire annotation ingestion into `pipeline.py` — read enriched CSV + annotations JSON, run encoding, write labeled CSV to `data.labeled_path`

## 5. Training Stage

- [ ] 5.1 Create `services/ml/training/train.py` — main training entry point: load labeled CSV, split, train, evaluate, log to MLflow
- [ ] 5.2 Implement temporal train/validation/test splitting with configurable ratios, warn on random split
- [ ] 5.3 Create `services/ml/training/models/random_forest.py` — RandomForestClassifier wrapper with class_weights support
- [ ] 5.4 Create `services/ml/training/models/xgboost_model.py` — XGBClassifier wrapper with class_weights support
- [ ] 5.5 Implement model dispatch — select model class based on `model_type` config, fail with supported types list for unknown types
- [ ] 5.6 Implement MLflow experiment tracking — create run, log config artifact, dataset params, per-class sample counts, all hyperparameters
- [ ] 5.7 Implement metrics logging — accuracy, f1_macro, f1_weighted, per-class precision/recall/F1
- [ ] 5.8 Create `services/ml/training/evaluation.py` — generate confusion matrix plot, feature importance plot, classification report text
- [ ] 5.9 Implement MLflow artifact logging — log confusion_matrix.png, feature_importance.png, classification_report.txt, pipeline_config.yaml
- [ ] 5.10 Implement MLflow model registration — log model with sklearn/xgboost flavor, register in registry if configured
- [ ] 5.11 Store training run metadata in PostgreSQL `training_runs` table
- [ ] 5.12 Wire training into `pipeline.py`

## 6. Inference Service (FastAPI)

- [ ] 6.1 Create `services/ml/app/main.py` — FastAPI app with CORS, startup event to load model
- [ ] 6.2 Implement model loading — from MLflow registry (by name + stage) or from local .pkl file via joblib
- [ ] 6.3 Implement preprocessing parity — load pipeline config from MLflow artifact, apply same feature engineering as training
- [ ] 6.4 Create `POST /predict` endpoint — accept candles array, run preprocessing, predict, return per-candle labels + confidence + spans + model_info
- [ ] 6.5 Implement prediction span grouping — group consecutive same-label non-"O" predictions into spans with avg_confidence
- [ ] 6.6 Create `POST /predict/batch` endpoint — accept pair/timeframe/date range, load data, process in batch_size chunks, return predictions
- [ ] 6.7 Create `GET /model/info` endpoint — return model metadata, per-class metrics from MLflow
- [ ] 6.8 Create `GET /model/labels` endpoint — return label names and colors
- [ ] 6.9 Create `GET /health` endpoint — check model loaded status, MLflow connection, PostgreSQL connection
- [ ] 6.10 Add Pydantic request/response models for all endpoints (PredictRequest, PredictResponse, BatchPredictRequest, ModelInfoResponse)

## 7. Next.js API Proxy Routes

- [ ] 7.1 Create `src/app/api/predict/route.ts` — POST proxy to `${INFERENCE_API_URL}/predict` with timeout handling
- [ ] 7.2 Create `src/app/api/predict/batch/route.ts` — POST proxy to `${INFERENCE_API_URL}/predict/batch` with INFERENCE_BATCH_TIMEOUT
- [ ] 7.3 Create `src/app/api/model/info/route.ts` — GET proxy to `${INFERENCE_API_URL}/model/info`
- [ ] 7.4 Add environment variables to `.env.local`: INFERENCE_API_URL, INFERENCE_API_TIMEOUT, INFERENCE_BATCH_TIMEOUT, NEXT_PUBLIC_PREDICTIONS_ENABLED

## 8. Span Annotation Export & Feedback

- [ ] 8.1 Create `src/app/api/span-annotations/export/route.ts` — GET endpoint exporting span annotations as JSON in ML pipeline format
- [ ] 8.2 Add `source` and `model_prediction` fields to span annotation schema (Drizzle migration) — source defaults to "human", model_prediction is nullable JSON
- [ ] 8.3 Update span annotation POST endpoint to accept optional `source` and `model_prediction` fields
- [ ] 8.4 Support negative annotations — span with label "O", source "human_correction", and model_prediction metadata

## 9. Prediction UI — State & Controls

- [ ] 9.1 Create `src/types/predictions.ts` — PredictionSpan, PredictionState, ModelInfoResponse interfaces
- [ ] 9.2 Create prediction state management in page.tsx (or dedicated context) — spans, isLoading, error, modelInfo, visible, confidenceThreshold, selectedLabels, autoPredict
- [ ] 9.3 Create `src/components/PredictionPanel.tsx` — controls panel with master toggle, model info display, action buttons, confidence slider, label checkboxes with metrics
- [ ] 9.4 Implement on-demand prediction fetching — "Run on Visible" sends visible candles to /api/predict, "Predict All" sends batch request
- [ ] 9.5 Implement prediction caching — Map keyed by pair_timeframe_range_modelVersion, invalidate on model version change

## 10. Prediction UI — Chart Rendering

- [ ] 10.1 Add histogram series to CandleChart for prediction rendering — per-bar colors from label config at 10-20% opacity
- [ ] 10.2 Add series markers for prediction span labels — show `{label} ({confidence}%)` below bars at span start
- [ ] 10.3 Implement confidence threshold filtering — only render predictions above threshold
- [ ] 10.4 Implement label type filtering — toggle visibility per label from PredictionPanel checkboxes
- [ ] 10.5 Implement prediction layer visibility toggle — show/hide histogram series and markers

## 11. Prediction UI — Disagreements & Feedback

- [ ] 11.1 Implement disagreement detection — compare human spans vs prediction spans with >50% overlap, classify as missed_by_model, missed_by_human, label_mismatch
- [ ] 11.2 Render disagreement highlights — red dashed border (missed_by_model), yellow highlight (missed_by_human), orange border (label_mismatch)
- [ ] 11.3 Add "Show only disagreements" filter toggle in PredictionPanel
- [ ] 11.4 Implement prediction-to-annotation feedback — click missed_by_human prediction opens span annotation dialog pre-filled with predicted label/times
- [ ] 11.5 Add "Not a pattern" dismiss action — saves negative annotation with label "O" and model_prediction metadata
- [ ] 11.6 Display prediction summary in PredictionPanel — prediction count, agreement count, disagreement count

## 12. Inference API Connection & Error Handling

- [ ] 12.1 Implement inference API health polling — poll /api/model/info every 30 seconds when API unavailable, auto-reconnect
- [ ] 12.2 Show "Model server offline" banner when inference API unavailable, disable prediction controls
- [ ] 12.3 Ensure annotation tools work independently — prediction API errors never block human annotation
- [ ] 12.4 Add loading states for prediction fetching — skeleton/shimmer overlay during prediction requests

## 13. Documentation & Deployment

- [ ] 13.1 Update docker-compose.yml with all service environment variables and health checks
- [ ] 13.2 Update DEPLOYMENT.md with Python service setup instructions, TA-Lib installation, MLflow server, PostgreSQL, DVC init
- [ ] 13.3 Update README.md with ML pipeline overview, architecture diagram, and usage instructions
- [ ] 13.4 Update CLAUDE_DESCRIPTION.md with new ML service capabilities and file structure