candle-annotator/openspec/changes/candle-backend/design.md at 1a653c58663693c748fccf61f45b776b7ad83561

Marko Djordjevic 1a653c5866 feat: add ML service scaffolding with Python FastAPI, Docker, and MLflow setup

2026-02-15 11:58:31 +01:00

8.1 KiB

Raw Blame History

Context

The Candle Annotator is a Next.js app with SQLite storage that lets users annotate candlestick charts with pattern labels. It currently has no ML capabilities — annotations are created manually and exported as CSV/JSON, but there's no way to train models or get predictions back into the UI.

The existing stack is: Next.js 16 (App Router), React 19, lightweight-charts v4, SQLite via Drizzle ORM, Docker deployment. The app runs as a single container on port 3000.

We need to add a Python ML service that sits alongside the Next.js app, connected via HTTP. The Python ecosystem (scikit-learn, XGBoost, TA-Lib, MLflow) is the right tool for this job — there's no viable way to do this in Node.js.

Goals / Non-Goals

Goals:

Stand up a Python FastAPI service at services/ml/ that handles feature engineering, annotation ingestion, training, and inference
Use TA-Lib for programmatic candlestick pattern detection (CDL* functions)
Train tree-based models (RandomForest, XGBoost) with MLflow tracking
Serve predictions via REST API on port 8001
Proxy inference requests through Next.js API routes to avoid CORS
Render model predictions on the chart as a distinct visual layer
Version datasets with DVC

Non-Goals:

Deep learning models (LSTM, GRU, transformer) — architecture should accommodate them later, but not implemented now
Multi-user or multi-tenant support
Real-time streaming predictions (batch/on-demand only)
Automated retraining pipelines or CI/CD for model deployment
GPU inference or training optimization

Decisions

1. Separate Python service vs. embedded in Next.js

Decision: Standalone Python FastAPI service in services/ml/, communicating via HTTP.

Alternatives considered:

Python subprocess spawned by Next.js — fragile process management, no independent scaling
Python WASM in browser — TA-Lib and scikit-learn don't work in WASM
Shared SQLite access from Python — SQLite doesn't handle concurrent writers well

Rationale: Clean separation of concerns. The Next.js app owns the UI and annotation data; the Python service owns ML. They communicate through well-defined REST APIs. Each can be developed, tested, and deployed independently.

2. Directory structure: `services/ml/` in the monorepo

Decision: Place the Python service at services/ml/ within the existing repo.

Alternatives considered:

Separate repository — adds overhead for a single-developer project
Top-level ml/ directory — services/ namespace leaves room for future services

Rationale: Monorepo keeps everything together. The services/ prefix signals it's a separate deployable unit, not part of the Next.js app.

3. Pipeline config via YAML

Decision: Single config/pipeline.yaml controls all pipeline stages (feature engineering, annotation ingestion, training, inference). Each stage has an enabled flag.

Rationale: Makes experiments reproducible — the full config is logged as an MLflow artifact with each training run. Stages can be toggled independently (e.g., skip feature engineering, use only programmatic labels).

4. MLflow for experiment tracking, DVC for data versioning

Decision: MLflow tracks experiments, metrics, models. DVC versions datasets.

Alternatives considered:

Weights & Biases — heavier, cloud-dependent
Plain file logging — loses queryability and model registry
Git LFS for data — doesn't handle dataset lineage

Rationale: MLflow runs locally (no cloud dependency), provides a model registry, and has native integrations with scikit-learn and XGBoost. DVC handles data versioning without bloating the git repo.

5. Annotation export format: JSON from existing API

Decision: The Python pipeline reads annotation data by calling the existing Next.js API endpoints (GET /api/annotations, span annotation exports) or from exported JSON/CSV files in data/annotations/.

Alternatives considered:

Direct SQLite read from Python — concurrent access issues
Shared PostgreSQL — overkill for single-user tool

Rationale: Using the existing API or file exports keeps the services decoupled. The annotation tool already has export functionality. For training, batch export to data/annotations/ is sufficient.

6. Label encoding: windowed classification first, BIO later

Decision: Start with fixed-window classification (each annotation span → one training sample of N candles). BIO sequence labeling is designed for but not implemented in v1.

Rationale: Window classification works with tree-based models (RandomForest, XGBoost) which are the initial model types. BIO encoding is needed for sequence models (BiLSTM-CRF) which are a non-goal for now.

7. Next.js proxy routes for inference

Decision: Next.js API routes at /api/predict, /api/predict/batch, /api/model/info proxy to the Python service.

Rationale: Avoids CORS configuration. Lets us add auth or rate-limiting on the Next.js side later. The frontend only talks to one origin.

8. Prediction rendering: histogram series overlay

Decision: Use a lightweight-charts histogram series to render predictions as colored bars behind candles. Each bar's color maps to a predicted pattern label.

Alternatives considered:

Custom canvas plugin — more control but significantly more code
Series markers only — no area highlighting, just point markers

Rationale: Histogram series is the simplest approach that gives visual area coverage. Can upgrade to a canvas plugin later for hatched/dashed styling. Markers are added for label text with confidence scores.

9. Docker: multi-container with docker-compose

Decision: Add an ml-service container to the existing docker-compose. Add an mlflow container for the tracking server. Shared volume for data/.

services:
  candle-annotator:  # existing
  ml-service:        # new - FastAPI on 8001
  mlflow:            # new - tracking server on 5000
  postgres:          # new - PostgreSQL for ML service state

Rationale: Each service has its own Dockerfile and dependencies. The shared data/ volume lets both services access OHLCV and annotation files.

Risks / Trade-offs

[TA-Lib C library dependency] → TA-Lib requires installing a system-level C library before the Python wrapper works. Mitigated by pinning it in the Dockerfile (apt-get install libta-lib-dev) and providing clear setup instructions for local development.

[MLflow storage growth] → MLflow artifacts (models, plots, configs) accumulate over time. Mitigated by using a local mlruns/ directory with periodic manual cleanup. Not a concern at single-user scale.

[Preprocessing parity] → Feature engineering during inference must exactly match training. If the pipeline config changes between training and inference, predictions are invalid. Mitigated by logging the full pipeline config as an MLflow artifact and loading it during inference to replicate preprocessing.

[Class imbalance] → Pattern classes will be heavily imbalanced (mostly "no pattern"). Mitigated by using class_weights: balanced and tracking per-class precision/recall, not just accuracy.

[SQLite concurrent access] → If both the Next.js app and Python service try to access the SQLite DB simultaneously, writes can fail. Mitigated by keeping Python read-only on annotation data (via API calls or file exports), never writing to the Next.js SQLite DB directly.

[Temporal data leakage] → Random train/test splits on time series data leak future information. Mitigated by enforcing temporal splits only (configurable but defaulting to temporal).

Resolved Questions

Python service database: PostgreSQL — the Python service uses its own Postgres instance for storing training run references, pipeline configs, and any service-specific state. Added to docker-compose.
DVC remote storage: Local backend — datasets versioned on the local filesystem, simplest setup for single-developer workflow.
Prediction persistence: Ephemeral — predictions are fetched on demand from the inference API, not persisted in any database. The frontend caches them in memory keyed by time range + model version.

8.1 KiB Raw Blame History

Context

Goals / Non-Goals

Decisions

1. Separate Python service vs. embedded in Next.js

2. Directory structure: services/ml/ in the monorepo

3. Pipeline config via YAML

4. MLflow for experiment tracking, DVC for data versioning

5. Annotation export format: JSON from existing API

6. Label encoding: windowed classification first, BIO later

7. Next.js proxy routes for inference

8. Prediction rendering: histogram series overlay

9. Docker: multi-container with docker-compose

Risks / Trade-offs

Resolved Questions

8.1 KiB

Raw Blame History

2. Directory structure: `services/ml/` in the monorepo