candle-annotator/openspec/changes/candle-backend/specs/ml-inference/spec.md

## ADDED Requirements

### Requirement: Model loading from MLflow registry
When `stages.inference.model_source` is "mlflow", the system SHALL load the model from the MLflow model registry using the model name (`stages.inference.mlflow_model_name`) and stage (`stages.inference.mlflow_model_stage`).

#### Scenario: Load production model
- **WHEN** model_source is "mlflow", model name is "candlestick_pattern_v1", and stage is "Production"
- **THEN** the system loads the model registered as "candlestick_pattern_v1" at the "Production" stage from MLflow

#### Scenario: Model not found in registry
- **WHEN** the specified model name or stage does not exist in the MLflow registry
- **THEN** the system SHALL return a clear error indicating the model was not found

### Requirement: Model loading from local file
When `stages.inference.model_source` is "local", the system SHALL load the model from the file path specified by `stages.inference.local_model_path` using joblib.

#### Scenario: Load local model
- **WHEN** model_source is "local" and local_model_path is "models/best_model.pkl"
- **THEN** the system loads the model from that file path

#### Scenario: Local model file missing
- **WHEN** the local_model_path does not exist
- **THEN** the system SHALL return an error indicating the model file was not found

### Requirement: Preprocessing parity
The inference service SHALL replicate the exact preprocessing (feature engineering) used during training. The system SHALL load the pipeline config artifact from the MLflow run that produced the model and apply the same feature engineering steps (TA-Lib indicators, candle features) with the same parameters.

#### Scenario: Matching preprocessing
- **WHEN** the model was trained with RSI(14) and EMA(20) features
- **THEN** inference SHALL compute RSI(14) and EMA(20) on the input candles before running the model

#### Scenario: Config mismatch warning
- **WHEN** the current pipeline config differs from the config stored with the model
- **THEN** the system SHALL log a warning about the mismatch

### Requirement: Predict endpoint
The system SHALL provide a `POST /predict` endpoint on the FastAPI service (port 8001). The endpoint SHALL accept a JSON body with `pair` (string), `timeframe` (string), and `candles` (array of objects with `time`, `open`, `high`, `low`, `close`, `volume`). It SHALL return predictions with per-candle labels and confidence scores, prediction spans (grouped continuous predictions), and model metadata.

#### Scenario: Successful prediction
- **WHEN** POST /predict is called with 100 valid candle objects
- **THEN** the system returns a JSON response with `predictions` array (one entry per candle with `time`, `label`, `confidence`), `spans` array (continuous same-label predictions grouped with `start_time`, `end_time`, `label`, `avg_confidence`), and `model_info` object

#### Scenario: Empty candles array
- **WHEN** POST /predict is called with an empty candles array
- **THEN** the system returns HTTP 400 with an error message

#### Scenario: Invalid candle data
- **WHEN** POST /predict is called with candle objects missing required fields
- **THEN** the system returns HTTP 422 with validation error details

### Requirement: Batch predict endpoint
The system SHALL provide a `POST /predict/batch` endpoint that accepts `pair`, `timeframe`, `start_date`, and `end_date`. The system SHALL load OHLCV data from its own data store for the specified range, process in chunks of `stages.inference.batch_size`, and return predictions for the full range.

#### Scenario: Batch prediction
- **WHEN** POST /predict/batch is called with pair "EURUSD", timeframe "1H", start_date and end_date spanning 6 months
- **THEN** the system loads the data, processes in batches, and returns predictions for the full range

#### Scenario: No data for range
- **WHEN** the requested date range has no OHLCV data available
- **THEN** the system returns HTTP 404 with a message indicating no data found for the range

### Requirement: Model info endpoint
The system SHALL provide a `GET /model/info` endpoint that returns metadata about the currently loaded model: model_name, model_version, model_type, trained_at, dataset_version, feature_engineering enabled status, list of all labels the model knows, and per-class metrics (precision, recall, F1, training sample count for each label).

#### Scenario: Get model info
- **WHEN** GET /model/info is called and a model is loaded
- **THEN** the system returns JSON with model metadata and per-class metrics

#### Scenario: No model loaded
- **WHEN** GET /model/info is called and no model has been loaded
- **THEN** the system returns HTTP 503 with a message indicating no model is available

### Requirement: Model labels endpoint
The system SHALL provide a `GET /model/labels` endpoint that returns the list of all pattern labels the current model can predict, along with their display colors.

#### Scenario: Get model labels
- **WHEN** GET /model/labels is called
- **THEN** the system returns a JSON array of label objects with `name` and `color` fields

### Requirement: Health check endpoint
The system SHALL provide a `GET /health` endpoint that returns the service status including whether a model is loaded, the MLflow connection status, and the PostgreSQL connection status.

#### Scenario: Healthy service
- **WHEN** GET /health is called and all dependencies are available
- **THEN** the system returns HTTP 200 with `{ "status": "healthy", "model_loaded": true, "mlflow": "connected", "database": "connected" }`

#### Scenario: Degraded service
- **WHEN** GET /health is called but the MLflow server is unreachable
- **THEN** the system returns HTTP 200 with `{ "status": "degraded", "model_loaded": true, "mlflow": "disconnected", "database": "connected" }`

### Requirement: Prediction confidence scores
Each prediction SHALL include a confidence score between 0.0 and 1.0 derived from the model's probability output. For tree-based models, this is the max class probability from `predict_proba()`.

#### Scenario: Confidence from predict_proba
- **WHEN** the model predicts class "bull_flag" with probability 0.87
- **THEN** the prediction confidence for that candle is 0.87

### Requirement: Prediction span grouping
The system SHALL group consecutive candle predictions with the same non-"O" label into prediction spans. Each span SHALL have `start_time`, `end_time`, `label`, and `avg_confidence` (mean confidence of candles in the span).

#### Scenario: Group consecutive predictions
- **WHEN** candles at T1, T2, T3 are all predicted as "bull_flag" with confidences 0.85, 0.90, 0.80
- **THEN** the system creates one span: `{ start_time: T1, end_time: T3, label: "bull_flag", avg_confidence: 0.85 }`

#### Scenario: Break on label change
- **WHEN** candle T1 is "bull_flag" and candle T2 is "bear_flag"
- **THEN** the system creates two separate spans