## Context

The Candle Annotator is a Next.js app with SQLite storage that lets users annotate candlestick charts with pattern labels. It currently has no ML capabilities — annotations are created manually and exported as CSV/JSON, but there's no way to train models or get predictions back into the UI.

The existing stack is: Next.js 16 (App Router), React 19, lightweight-charts v4, SQLite via Drizzle ORM, Docker deployment. The app runs as a single container on port 3000.

We need to add a Python ML service that sits alongside the Next.js app, connected via HTTP. The Python ecosystem (scikit-learn, XGBoost, TA-Lib, MLflow) is the right tool for this job — there's no viable way to do this in Node.js.

## Goals / Non-Goals

**Goals:**

- Stand up a Python FastAPI service at `services/ml/` that handles feature engineering, annotation ingestion, training, and inference
- Use TA-Lib for programmatic candlestick pattern detection (CDL* functions)
- Train tree-based models (RandomForest, XGBoost) with MLflow tracking
- Serve predictions via REST API on port 8001
- Proxy inference requests through Next.js API routes to avoid CORS
- Render model predictions on the chart as a distinct visual layer
- Version datasets with DVC

**Non-Goals:**

- Deep learning models (LSTM, GRU, transformer) — architecture should accommodate them later, but not implemented now
- Multi-user or multi-tenant support
- Real-time streaming predictions (batch/on-demand only)
- Automated retraining pipelines or CI/CD for model deployment
- GPU inference or training optimization

## Decisions

### 1. Separate Python service vs. embedded in Next.js

**Decision**: Standalone Python FastAPI service in `services/ml/`, communicating via HTTP.

**Alternatives considered**:
- Python subprocess spawned by Next.js — fragile process management, no independent scaling
- Python WASM in browser — TA-Lib and scikit-learn don't work in WASM
- Shared SQLite access from Python — SQLite doesn't handle concurrent writers well

**Rationale**: Clean separation of concerns. The Next.js app owns the UI and annotation data; the Python service owns ML. They communicate through well-defined REST APIs. Each can be developed, tested, and deployed independently.

### 2. Directory structure: `services/ml/` in the monorepo

**Decision**: Place the Python service at `services/ml/` within the existing repo.

**Alternatives considered**:
- Separate repository — adds overhead for a single-developer project
- Top-level `ml/` directory — `services/` namespace leaves room for future services

**Rationale**: Monorepo keeps everything together. The `services/` prefix signals it's a separate deployable unit, not part of the Next.js app.

### 3. Pipeline config via YAML

**Decision**: Single `config/pipeline.yaml` controls all pipeline stages (feature engineering, annotation ingestion, training, inference). Each stage has an `enabled` flag.

**Rationale**: Makes experiments reproducible — the full config is logged as an MLflow artifact with each training run. Stages can be toggled independently (e.g., skip feature engineering, use only programmatic labels).

### 4. MLflow for experiment tracking, DVC for data versioning

**Decision**: MLflow tracks experiments, metrics, models. DVC versions datasets.

**Alternatives considered**:
- Weights & Biases — heavier, cloud-dependent
- Plain file logging — loses queryability and model registry
- Git LFS for data — doesn't handle dataset lineage

**Rationale**: MLflow runs locally (no cloud dependency), provides a model registry, and has native integrations with scikit-learn and XGBoost. DVC handles data versioning without bloating the git repo.

### 5. Annotation export format: JSON from existing API

**Decision**: The Python pipeline reads annotation data by calling the existing Next.js API endpoints (`GET /api/annotations`, span annotation exports) or from exported JSON/CSV files in `data/annotations/`.

**Alternatives considered**:
- Direct SQLite read from Python — concurrent access issues
- Shared PostgreSQL — overkill for single-user tool

**Rationale**: Using the existing API or file exports keeps the services decoupled. The annotation tool already has export functionality. For training, batch export to `data/annotations/` is sufficient.

### 6. Label encoding: windowed classification first, BIO later

**Decision**: Start with fixed-window classification (each annotation span → one training sample of N candles). BIO sequence labeling is designed for but not implemented in v1.

**Rationale**: Window classification works with tree-based models (RandomForest, XGBoost) which are the initial model types. BIO encoding is needed for sequence models (BiLSTM-CRF) which are a non-goal for now.

### 7. Next.js proxy routes for inference

**Decision**: Next.js API routes at `/api/predict`, `/api/predict/batch`, `/api/model/info` proxy to the Python service.

**Rationale**: Avoids CORS configuration. Lets us add auth or rate-limiting on the Next.js side later. The frontend only talks to one origin.

### 8. Prediction rendering: histogram series overlay

**Decision**: Use a lightweight-charts histogram series to render predictions as colored bars behind candles. Each bar's color maps to a predicted pattern label.

**Alternatives considered**:
- Custom canvas plugin — more control but significantly more code
- Series markers only — no area highlighting, just point markers

**Rationale**: Histogram series is the simplest approach that gives visual area coverage. Can upgrade to a canvas plugin later for hatched/dashed styling. Markers are added for label text with confidence scores.

### 9. Docker: multi-container with docker-compose

**Decision**: Add an `ml-service` container to the existing docker-compose. Add an `mlflow` container for the tracking server. Shared volume for `data/`.

```
services:
  candle-annotator:  # existing
  ml-service:        # new - FastAPI on 8001
  mlflow:            # new - tracking server on 5000
  postgres:          # new - PostgreSQL for ML service state
```

**Rationale**: Each service has its own Dockerfile and dependencies. The shared `data/` volume lets both services access OHLCV and annotation files.

## Risks / Trade-offs

**[TA-Lib C library dependency]** → TA-Lib requires installing a system-level C library before the Python wrapper works. Mitigated by pinning it in the Dockerfile (`apt-get install libta-lib-dev`) and providing clear setup instructions for local development.

**[MLflow storage growth]** → MLflow artifacts (models, plots, configs) accumulate over time. Mitigated by using a local `mlruns/` directory with periodic manual cleanup. Not a concern at single-user scale.

**[Preprocessing parity]** → Feature engineering during inference must exactly match training. If the pipeline config changes between training and inference, predictions are invalid. Mitigated by logging the full pipeline config as an MLflow artifact and loading it during inference to replicate preprocessing.

**[Class imbalance]** → Pattern classes will be heavily imbalanced (mostly "no pattern"). Mitigated by using `class_weights: balanced` and tracking per-class precision/recall, not just accuracy.

**[SQLite concurrent access]** → If both the Next.js app and Python service try to access the SQLite DB simultaneously, writes can fail. Mitigated by keeping Python read-only on annotation data (via API calls or file exports), never writing to the Next.js SQLite DB directly.

**[Temporal data leakage]** → Random train/test splits on time series data leak future information. Mitigated by enforcing temporal splits only (configurable but defaulting to temporal).

## Resolved Questions

- **Python service database**: PostgreSQL — the Python service uses its own Postgres instance for storing training run references, pipeline configs, and any service-specific state. Added to docker-compose.
- **DVC remote storage**: Local backend — datasets versioned on the local filesystem, simplest setup for single-developer workflow.
- **Prediction persistence**: Ephemeral — predictions are fetched on demand from the inference API, not persisted in any database. The frontend caches them in memory keyed by time range + model version.