sync: migrate delta specs to main openspec/specs

- Added 5 new capabilities: feature-engineering, annotation-ingestion, ml-training, ml-inference, prediction-ui - Updated 2 existing capabilities: backend-api, span-annotation - All specs synced from openspec/changes/candle-backend/specs/
2026-02-16 11:44:34 +01:00 · 2026-02-16 11:44:34 +01:00 · 7e0579f65d
commit 7e0579f65d
parent 65f00e6ce7
7 changed files with 525 additions and 252 deletions
--- a/openspec/specs/annotation-ingestion/spec.md
+++ b/openspec/specs/annotation-ingestion/spec.md
@ -0,0 +1,85 @@
+## ADDED Requirements
+
+### Requirement: Load annotations from JSON export
+The system SHALL load annotation data from JSON files exported by the annotation tool, located at `data.annotations_path`. The expected format is a JSON object with an `annotations` array where each annotation has: `id`, `start_time`, `end_time`, `label`, `confidence` (nullable), `outcome` (nullable), and `sub_spans` (nullable).
+
+#### Scenario: Load valid annotations JSON
+- **WHEN** `data.annotations_path` points to a valid JSON file with annotations
+- **THEN** the system loads all annotation objects into memory for processing
+
+#### Scenario: Missing annotations file
+- **WHEN** `data.annotations_path` points to a file that does not exist and annotation ingestion is enabled
+- **THEN** the system SHALL fail with an error message identifying the missing file path
+
+#### Scenario: Filter by confidence
+- **WHEN** `stages.annotation_ingestion.min_confidence` is set to 3
+- **THEN** annotations with confidence below 3 SHALL be excluded from the labeled dataset
+
+### Requirement: Windowed classification encoding
+When `stages.annotation_ingestion.label_encoding` is "window", the system SHALL convert each annotation span into a fixed-size window of candles. The window size is defined by `stages.annotation_ingestion.window_size`. If the annotation span is shorter than window_size, the system SHALL pad with context candles (centered on the span). If the span is longer, the system SHALL use the full span. Each window becomes one row in the output with flattened OHLCV + feature columns.
+
+#### Scenario: Span shorter than window
+- **WHEN** an annotation spans 10 candles and window_size is 30
+- **THEN** the system extracts 30 candles centered on the annotation (10 before, 10 span, 10 after) and flattens them into a single row
+
+#### Scenario: Span longer than window
+- **WHEN** an annotation spans 50 candles and window_size is 30
+- **THEN** the system uses all 50 candles and flattens them into a single row
+
+#### Scenario: Span near dataset boundary
+- **WHEN** an annotation is near the start of the dataset and there aren't enough candles for padding
+- **THEN** the system SHALL pad with as many candles as available (no error), filling missing positions with NaN
+
+### Requirement: BIO sequence labeling encoding
+When `stages.annotation_ingestion.label_encoding` is "bio", the system SHALL assign a BIO tag to each candle in the dataset based on annotations. The first candle of an annotation span gets `B-{label}`, subsequent candles in the span get `I-{label}`, and candles outside any annotation get `O`.
+
+#### Scenario: Single annotation BIO tags
+- **WHEN** a "bull_flag" annotation spans candles at times T5 through T8
+- **THEN** candle T5 gets tag `B-bull_flag`, candles T6-T8 get `I-bull_flag`, all other candles get `O`
+
+#### Scenario: Overlapping annotations
+- **WHEN** two annotations overlap in time range
+- **THEN** the system SHALL create multiple tag columns (`bio_tag_1`, `bio_tag_2`) to represent both annotations
+
+### Requirement: Programmatic TA-Lib pattern labels
+When `stages.annotation_ingestion.programmatic_labels.enabled` is true, the system SHALL run TA-Lib CDL* pattern recognition functions listed in `talib_patterns` on the OHLC data. Each CDL function returns +100 (bullish), -100 (bearish), or 0 (no pattern). The system SHALL convert non-zero results to label names (e.g., `CDL_ENGULFING` with +100 → `bullish_engulfing`).
+
+#### Scenario: Detect engulfing pattern
+- **WHEN** `CDL_ENGULFING` is in the talib_patterns list and the OHLC data contains an engulfing pattern
+- **THEN** the system generates a label `bullish_engulfing` or `bearish_engulfing` for the corresponding candle
+
+#### Scenario: No pattern detected
+- **WHEN** a CDL function returns 0 for a candle
+- **THEN** no programmatic label is assigned to that candle
+
+### Requirement: Human and programmatic label merge
+When both human annotations and programmatic labels exist for the same candle, the system SHALL merge them using the strategy in `stages.annotation_ingestion.merge_strategy`: "human_priority" keeps the human label, "programmatic_priority" keeps the TA-Lib label, "both" keeps both as separate label columns.
+
+#### Scenario: Human priority merge
+- **WHEN** merge_strategy is "human_priority" and a candle has human label "bull_flag" and programmatic label "bullish_engulfing"
+- **THEN** the output label for that candle is "bull_flag"
+
+#### Scenario: Both labels merge
+- **WHEN** merge_strategy is "both" and a candle has both human and programmatic labels
+- **THEN** the output has two separate label columns: `label_human` and `label_programmatic`
+
+### Requirement: Context padding
+The system SHALL include `stages.annotation_ingestion.context_padding` candles before and after each annotation span in the labeled output. This provides trend context for models.
+
+#### Scenario: Add padding candles
+- **WHEN** context_padding is 20 and an annotation spans candles T10 to T15
+- **THEN** the output includes candles from T-10 (or dataset start) through T35 (or dataset end) associated with that annotation
+
+### Requirement: Dataset statistics logging
+After annotation ingestion completes, the system SHALL log: total annotations by label, class distribution percentages, average span length per label, and agreement rate between human and programmatic labels (when both are enabled).
+
+#### Scenario: Log class distribution
+- **WHEN** annotation ingestion completes with 50 "bull_flag", 30 "bear_flag", and 200 "O" labels
+- **THEN** the system logs the counts and percentages for each class
+
+### Requirement: Labeled CSV output
+The system SHALL write the labeled dataset to `data.labeled_path` in CSV format. The output SHALL contain all feature columns plus the target label column(s).
+
+#### Scenario: Write labeled CSV
+- **WHEN** annotation ingestion completes successfully
+- **THEN** the labeled CSV is written to `data.labeled_path` with all feature and label columns
--- a/openspec/specs/backend-api/spec.md
+++ b/openspec/specs/backend-api/spec.md
@ -1,91 +1,38 @@
 ## ADDED Requirements

-### Requirement: Upload endpoint
-The system SHALL provide a `POST /api/upload` endpoint that accepts a CSV file via multipart form data. The endpoint SHALL create a new chart (named from the uploaded filename without extension), parse the CSV using papaparse, validate the format, and insert all candle records into the `candles` table with the new chart's `chart_id` within a single database transaction. On success, the endpoint SHALL return a JSON response with the chart `id`, chart `name`, and the count of inserted records. On failure, it SHALL return an appropriate error status and message without creating a chart.
+### Requirement: Predict proxy endpoint
+The system SHALL provide a `POST /api/predict` Next.js API route that proxies requests to the Python inference service at `${INFERENCE_API_URL}/predict`. The route SHALL forward the request body (pair, timeframe, candles array) and return the Python service's response. If the inference service is unreachable, the route SHALL return HTTP 503 with `{ "error": "Inference service unavailable" }`.

-#### Scenario: Successful upload
- **WHEN** a valid CSV file named "BTC-1H.csv" is sent to POST /api/upload
- **THEN** endpoint creates a chart named "BTC-1H", inserts candles with that chart_id, and returns `{ "success": true, "count": <number>, "chart": { "id": <number>, "name": "BTC-1H" } }` with HTTP 200
+#### Scenario: Successful prediction proxy
+- **WHEN** POST /api/predict is called with valid candle data and the Python service is running
+- **THEN** the route forwards the request to the inference service and returns the prediction response with HTTP 200

-#### Scenario: Invalid CSV upload
- **WHEN** a CSV with missing or invalid headers is sent to POST /api/upload
- **THEN** endpoint returns `{ "error": "<description>" }` with HTTP 400 and does not create a chart
+#### Scenario: Inference service down
+- **WHEN** POST /api/predict is called but the Python inference service is unreachable
+- **THEN** the route returns HTTP 503 with `{ "error": "Inference service unavailable" }`

-#### Scenario: No file provided
- **WHEN** POST /api/upload is called without a file
- **THEN** endpoint returns `{ "error": "No file provided" }` with HTTP 400
+#### Scenario: Inference service error
+- **WHEN** the Python inference service returns an error status (4xx or 5xx)
+- **THEN** the route forwards the error status and message to the client

-#### Scenario: Duplicate filename
- **WHEN** a CSV named "BTC-1H.csv" is uploaded and a chart named "BTC-1H" already exists
- **THEN** endpoint creates a chart named "BTC-1H-2" (or next available suffix) and inserts candles under that chart
+### Requirement: Batch predict proxy endpoint
+The system SHALL provide a `POST /api/predict/batch` Next.js API route that proxies batch prediction requests to `${INFERENCE_API_URL}/predict/batch`. The route SHALL forward pair, timeframe, start_date, and end_date.

-### Requirement: Get annotations endpoint
-The system SHALL provide a `GET /api/annotations` endpoint that accepts an optional `chartId` query parameter. When `chartId` is provided, the endpoint SHALL return only annotations belonging to that chart. When `chartId` is omitted, the endpoint SHALL return annotations for the most recently created chart. Each annotation object SHALL include: `id`, `chart_id`, `timestamp`, `label_type`, `geometry` (parsed from JSON string or null), and `created_at`.
+#### Scenario: Successful batch prediction
+- **WHEN** POST /api/predict/batch is called with valid parameters
+- **THEN** the route forwards to the inference service and returns the batch prediction response

-#### Scenario: Fetch annotations for specific chart
- **WHEN** GET /api/annotations?chartId=3 is called
- **THEN** endpoint returns a JSON array of annotations where chart_id equals 3, with HTTP 200
+#### Scenario: Timeout on large batch
+- **WHEN** the batch prediction takes longer than INFERENCE_BATCH_TIMEOUT
+- **THEN** the route returns HTTP 504 with `{ "error": "Batch prediction timed out" }`

-#### Scenario: Fetch annotations without chartId
- **WHEN** GET /api/annotations is called without a chartId parameter
- **THEN** endpoint returns annotations for the most recently created chart with HTTP 200
+### Requirement: Model info proxy endpoint
+The system SHALL provide a `GET /api/model/info` Next.js API route that proxies to `${INFERENCE_API_URL}/model/info`. This endpoint returns model metadata and per-class metrics.

-#### Scenario: No annotations exist for chart
- **WHEN** GET /api/annotations?chartId=3 is called and no annotations exist for chart 3
- **THEN** endpoint returns an empty JSON array `[]` with HTTP 200
+#### Scenario: Successful model info
+- **WHEN** GET /api/model/info is called and the inference service is running
+- **THEN** the route returns the model metadata JSON

-### Requirement: Create annotation endpoint
-The system SHALL provide a `POST /api/annotations` endpoint that accepts a JSON body with fields: `timestamp` (required, integer), `label_type` (required, string), `chart_id` (required, integer), and `geometry` (optional, object). The endpoint SHALL validate the input, verify the chart exists, serialize geometry to JSON string if present, and insert the record into the `annotations` table. On success, it SHALL return the created annotation object with its assigned `id`.
-
-#### Scenario: Create a marker annotation
- **WHEN** POST /api/annotations is called with `{ "timestamp": 1700000000, "label_type": "break_up", "chart_id": 3 }`
- **THEN** endpoint saves the annotation with chart_id 3 and returns the created object with `id` and HTTP 201
-
-#### Scenario: Create a line annotation
- **WHEN** POST /api/annotations is called with `{ "timestamp": 1700000000, "label_type": "line", "chart_id": 3, "geometry": { "startTime": 1700000000, "startPrice": 1.05, "endTime": 1700100000, "endPrice": 1.06 } }`
- **THEN** endpoint saves the annotation with chart_id 3 and serialized geometry JSON, returns the created object with HTTP 201
-
-#### Scenario: Invalid annotation data
- **WHEN** POST /api/annotations is called with missing required fields (timestamp, label_type, or chart_id)
- **THEN** endpoint returns `{ "error": "<description>" }` with HTTP 400
-
-#### Scenario: Annotation for non-existent chart
- **WHEN** POST /api/annotations is called with a chart_id that does not exist
- **THEN** endpoint returns `{ "error": "Chart not found" }` with HTTP 404
-
-### Requirement: Delete annotation endpoint
-The system SHALL provide a `DELETE /api/annotations/[id]` endpoint that removes an annotation by its ID. On success, it SHALL return HTTP 200. If the annotation does not exist, it SHALL return HTTP 404.
-
-#### Scenario: Delete existing annotation
- **WHEN** DELETE /api/annotations/5 is called and annotation with id 5 exists
- **THEN** endpoint removes the annotation and returns HTTP 200
-
-#### Scenario: Delete non-existent annotation
- **WHEN** DELETE /api/annotations/999 is called and no annotation with that id exists
- **THEN** endpoint returns HTTP 404
-
-### Requirement: Export annotations endpoint
-The system SHALL provide a `GET /api/export` endpoint that accepts an optional `chartId` query parameter. When `chartId` is provided, the endpoint SHALL export only annotations for that chart. When `chartId` is omitted, the endpoint SHALL export annotations for the most recently created chart. The CSV SHALL have columns: `timestamp`, `label_type`, `price`. The response SHALL set `Content-Type: text/csv` and `Content-Disposition: attachment; filename="annotations.csv"` headers.
-
-#### Scenario: Export for specific chart
- **WHEN** GET /api/export?chartId=3 is called and annotations exist for chart 3
- **THEN** endpoint returns a CSV file download with only annotations belonging to chart 3
-
-#### Scenario: Export without chartId
- **WHEN** GET /api/export is called without a chartId parameter
- **THEN** endpoint exports annotations for the most recently created chart
-
-### Requirement: Get candles endpoint
-The system SHALL provide a `GET /api/candles` endpoint that accepts an optional `chartId` query parameter. When `chartId` is provided, the endpoint SHALL return only candles belonging to that chart. When `chartId` is omitted, the endpoint SHALL return candles for the most recently created chart. Results SHALL be ordered by time ascending. Each object SHALL include: `time`, `open`, `high`, `low`, `close`.
-
-#### Scenario: Fetch candles for specific chart
- **WHEN** GET /api/candles?chartId=3 is called
- **THEN** endpoint returns a JSON array of candle objects where chart_id equals 3, ordered by time ascending with HTTP 200
-
-#### Scenario: Fetch candles without chartId
- **WHEN** GET /api/candles is called without a chartId parameter
- **THEN** endpoint returns candles for the most recently created chart with HTTP 200
-
-#### Scenario: No candles exist for chart
- **WHEN** GET /api/candles?chartId=3 is called and no candles exist for chart 3
- **THEN** endpoint returns an empty JSON array `[]` with HTTP 200
+#### Scenario: No model available
+- **WHEN** GET /api/model/info is called and the inference service returns 503
+- **THEN** the route returns HTTP 503 with `{ "error": "No model available" }`
--- a/openspec/specs/feature-engineering/spec.md
+++ b/openspec/specs/feature-engineering/spec.md
@ -0,0 +1,60 @@
+## ADDED Requirements
+
+### Requirement: TA-Lib indicator computation
+The system SHALL compute technical indicators from raw OHLCV data using TA-Lib. The pipeline config's `stages.feature_engineering.talib_indicators` list defines which indicators to compute. Each indicator entry specifies a `name` (TA-Lib function name) and `params` (dictionary of function parameters). Computed indicators SHALL be appended as new columns to the output CSV using lowercase naming: `{indicator}_{param}` (e.g., `rsi_14`, `ema_20`, `macd`, `macd_signal`, `macd_hist`, `bbands_upper`, `bbands_middle`, `bbands_lower`).
+
+#### Scenario: Compute RSI indicator
+- **WHEN** the config includes `{ name: "RSI", params: { timeperiod: 14 } }` and feature engineering is enabled
+- **THEN** the system computes RSI with period 14 and appends a `rsi_14` column to the enriched CSV
+
+#### Scenario: Compute multi-output indicator
+- **WHEN** the config includes `{ name: "MACD", params: { fastperiod: 12, slowperiod: 26, signalperiod: 9 } }`
+- **THEN** the system appends `macd`, `macd_signal`, and `macd_hist` columns to the enriched CSV
+
+#### Scenario: TA-Lib not installed
+- **WHEN** feature engineering is enabled but the TA-Lib C library is not installed on the system
+- **THEN** the system SHALL fail with a clear error message including installation instructions for the user's platform, and SHALL NOT silently skip the stage
+
+#### Scenario: Feature engineering disabled
+- **WHEN** `stages.feature_engineering.enabled` is false
+- **THEN** the system SHALL skip indicator computation entirely and pass raw OHLCV data to the next stage
+
+### Requirement: Candle feature extraction
+When `stages.feature_engineering.candle_features` is true, the system SHALL compute derived candle features for each row: `body_size` (abs(close - open)), `body_direction` (1 if close >= open, else -1), `upper_wick` (high - max(open, close)), `lower_wick` (min(open, close) - low), `wick_ratio` (upper_wick / lower_wick), `body_to_range` (body_size / (high - low)), `gap` (open - previous close), and `range` (high - low).
+
+#### Scenario: Compute candle features
+- **WHEN** `candle_features` is true and feature engineering is enabled
+- **THEN** the system appends columns `body_size`, `body_direction`, `upper_wick`, `lower_wick`, `wick_ratio`, `body_to_range`, `gap`, `range` to the enriched CSV
+
+#### Scenario: Division by zero handling
+- **WHEN** a candle has `lower_wick` equal to 0 (for `wick_ratio`) or `high` equal to `low` (for `body_to_range`)
+- **THEN** the system SHALL set the result to 0.0 instead of raising an error
+
+#### Scenario: Gap for first candle
+- **WHEN** computing `gap` for the first candle in the dataset (no previous close)
+- **THEN** the system SHALL set gap to 0.0
+
+### Requirement: Custom feature functions
+When `stages.feature_engineering.custom_features` is configured, the system SHALL dynamically import each listed Python module path and call it as a function. Each custom feature function SHALL accept a pandas DataFrame (the full OHLCV + computed features so far) and return a pandas Series. The returned Series SHALL be appended as a new column named after the function.
+
+#### Scenario: Load custom feature
+- **WHEN** the config includes `custom_features: ["features.custom.trend_slope"]`
+- **THEN** the system imports `features.custom.trend_slope`, calls it with the DataFrame, and appends the result as a `trend_slope` column
+
+#### Scenario: Custom feature import error
+- **WHEN** a custom feature module path cannot be imported
+- **THEN** the system SHALL fail with an error message naming the unresolvable module path
+
+### Requirement: NaN handling for warmup periods
+After computing all indicators, the system SHALL handle NaN values introduced by indicator warmup periods. Rows with NaN values in indicator columns SHALL be dropped from the output. The system SHALL log how many rows were dropped.
+
+#### Scenario: Drop warmup rows
+- **WHEN** RSI with period 14 produces NaN for the first 14 rows
+- **THEN** those rows are dropped from the enriched CSV and a log message reports "Dropped 14 rows due to indicator warmup"
+
+### Requirement: Enriched CSV output
+The system SHALL write the enriched dataset (original OHLCV columns + all computed feature columns) to the path specified by `data.enriched_path` in CSV format. The output SHALL preserve the original column order with new feature columns appended.
+
+#### Scenario: Write enriched CSV
+- **WHEN** feature engineering completes successfully
+- **THEN** the enriched CSV is written to `data.enriched_path` with all original and computed columns
--- a/openspec/specs/ml-inference/spec.md
+++ b/openspec/specs/ml-inference/spec.md
@ -0,0 +1,107 @@
+## ADDED Requirements
+
+### Requirement: Model loading from MLflow registry
+When `stages.inference.model_source` is "mlflow", the system SHALL load the model from the MLflow model registry using the model name (`stages.inference.mlflow_model_name`) and stage (`stages.inference.mlflow_model_stage`).
+
+#### Scenario: Load production model
+- **WHEN** model_source is "mlflow", model name is "candlestick_pattern_v1", and stage is "Production"
+- **THEN** the system loads the model registered as "candlestick_pattern_v1" at the "Production" stage from MLflow
+
+#### Scenario: Model not found in registry
+- **WHEN** the specified model name or stage does not exist in the MLflow registry
+- **THEN** the system SHALL return a clear error indicating the model was not found
+
+### Requirement: Model loading from local file
+When `stages.inference.model_source` is "local", the system SHALL load the model from the file path specified by `stages.inference.local_model_path` using joblib.
+
+#### Scenario: Load local model
+- **WHEN** model_source is "local" and local_model_path is "models/best_model.pkl"
+- **THEN** the system loads the model from that file path
+
+#### Scenario: Local model file missing
+- **WHEN** the local_model_path does not exist
+- **THEN** the system SHALL return an error indicating the model file was not found
+
+### Requirement: Preprocessing parity
+The inference service SHALL replicate the exact preprocessing (feature engineering) used during training. The system SHALL load the pipeline config artifact from the MLflow run that produced the model and apply the same feature engineering steps (TA-Lib indicators, candle features) with the same parameters.
+
+#### Scenario: Matching preprocessing
+- **WHEN** the model was trained with RSI(14) and EMA(20) features
+- **THEN** inference SHALL compute RSI(14) and EMA(20) on the input candles before running the model
+
+#### Scenario: Config mismatch warning
+- **WHEN** the current pipeline config differs from the config stored with the model
+- **THEN** the system SHALL log a warning about the mismatch
+
+### Requirement: Predict endpoint
+The system SHALL provide a `POST /predict` endpoint on the FastAPI service (port 8001). The endpoint SHALL accept a JSON body with `pair` (string), `timeframe` (string), and `candles` (array of objects with `time`, `open`, `high`, `low`, `close`, `volume`). It SHALL return predictions with per-candle labels and confidence scores, prediction spans (grouped continuous predictions), and model metadata.
+
+#### Scenario: Successful prediction
+- **WHEN** POST /predict is called with 100 valid candle objects
+- **THEN** the system returns a JSON response with `predictions` array (one entry per candle with `time`, `label`, `confidence`), `spans` array (continuous same-label predictions grouped with `start_time`, `end_time`, `label`, `avg_confidence`), and `model_info` object
+
+#### Scenario: Empty candles array
+- **WHEN** POST /predict is called with an empty candles array
+- **THEN** the system returns HTTP 400 with an error message
+
+#### Scenario: Invalid candle data
+- **WHEN** POST /predict is called with candle objects missing required fields
+- **THEN** the system returns HTTP 422 with validation error details
+
+### Requirement: Batch predict endpoint
+The system SHALL provide a `POST /predict/batch` endpoint that accepts `pair`, `timeframe`, `start_date`, and `end_date`. The system SHALL load OHLCV data from its own data store for the specified range, process in chunks of `stages.inference.batch_size`, and return predictions for the full range.
+
+#### Scenario: Batch prediction
+- **WHEN** POST /predict/batch is called with pair "EURUSD", timeframe "1H", start_date and end_date spanning 6 months
+- **THEN** the system loads the data, processes in batches, and returns predictions for the full range
+
+#### Scenario: No data for range
+- **WHEN** the requested date range has no OHLCV data available
+- **THEN** the system returns HTTP 404 with a message indicating no data found for the range
+
+### Requirement: Model info endpoint
+The system SHALL provide a `GET /model/info` endpoint that returns metadata about the currently loaded model: model_name, model_version, model_type, trained_at, dataset_version, feature_engineering enabled status, list of all labels the model knows, and per-class metrics (precision, recall, F1, training sample count for each label).
+
+#### Scenario: Get model info
+- **WHEN** GET /model/info is called and a model is loaded
+- **THEN** the system returns JSON with model metadata and per-class metrics
+
+#### Scenario: No model loaded
+- **WHEN** GET /model/info is called and no model has been loaded
+- **THEN** the system returns HTTP 503 with a message indicating no model is available
+
+### Requirement: Model labels endpoint
+The system SHALL provide a `GET /model/labels` endpoint that returns the list of all pattern labels the current model can predict, along with their display colors.
+
+#### Scenario: Get model labels
+- **WHEN** GET /model/labels is called
+- **THEN** the system returns a JSON array of label objects with `name` and `color` fields
+
+### Requirement: Health check endpoint
+The system SHALL provide a `GET /health` endpoint that returns the service status including whether a model is loaded, the MLflow connection status, and the PostgreSQL connection status.
+
+#### Scenario: Healthy service
+- **WHEN** GET /health is called and all dependencies are available
+- **THEN** the system returns HTTP 200 with `{ "status": "healthy", "model_loaded": true, "mlflow": "connected", "database": "connected" }`
+
+#### Scenario: Degraded service
+- **WHEN** GET /health is called but the MLflow server is unreachable
+- **THEN** the system returns HTTP 200 with `{ "status": "degraded", "model_loaded": true, "mlflow": "disconnected", "database": "connected" }`
+
+### Requirement: Prediction confidence scores
+Each prediction SHALL include a confidence score between 0.0 and 1.0 derived from the model's probability output. For tree-based models, this is the max class probability from `predict_proba()`.
+
+#### Scenario: Confidence from predict_proba
+- **WHEN** the model predicts class "bull_flag" with probability 0.87
+- **THEN** the prediction confidence for that candle is 0.87
+
+### Requirement: Prediction span grouping
+The system SHALL group consecutive candle predictions with the same non-"O" label into prediction spans. Each span SHALL have `start_time`, `end_time`, `label`, and `avg_confidence` (mean confidence of candles in the span).
+
+#### Scenario: Group consecutive predictions
+- **WHEN** candles at T1, T2, T3 are all predicted as "bull_flag" with confidences 0.85, 0.90, 0.80
+- **THEN** the system creates one span: `{ start_time: T1, end_time: T3, label: "bull_flag", avg_confidence: 0.85 }`
+
+#### Scenario: Break on label change
+- **WHEN** candle T1 is "bull_flag" and candle T2 is "bear_flag"
+- **THEN** the system creates two separate spans
--- a/openspec/specs/ml-training/spec.md
+++ b/openspec/specs/ml-training/spec.md
@ -0,0 +1,92 @@
+## ADDED Requirements
+
+### Requirement: Temporal train/test splitting
+The system SHALL split the labeled dataset into train, validation, and test sets using temporal ordering. Data SHALL be sorted by time. The first portion is training, middle is validation, last is test. Split ratios are defined by `stages.training.test_split` and `stages.training.validation_split`. The system SHALL NOT shuffle financial time series data.
+
+#### Scenario: Temporal split
+- **WHEN** test_split is 0.2, validation_split is 0.1, and the dataset has 1000 rows sorted by time
+- **THEN** the first 700 rows are training, next 100 are validation, last 200 are test
+
+#### Scenario: Random split option
+- **WHEN** split_method is "random"
+- **THEN** the system uses standard random splitting (sklearn train_test_split) but logs a warning that this is not recommended for financial data
+
+### Requirement: Class weight balancing
+The system SHALL apply class weighting to handle imbalanced pattern labels. When `stages.training.class_weights` is "balanced", the system SHALL compute inverse-frequency weights so rare pattern classes receive higher training weight.
+
+#### Scenario: Balanced weights
+- **WHEN** class_weights is "balanced" and the dataset has 500 "O" labels and 50 "bull_flag" labels
+- **THEN** the model trains with class weights inversely proportional to class frequency
+
+### Requirement: Model training dispatch
+The system SHALL train the model type specified in `stages.training.model_type` using the hyperparameters in `stages.training.hyperparameters`. Supported model types for v1: "random_forest" (scikit-learn RandomForestClassifier) and "xgboost" (XGBClassifier).
+
+#### Scenario: Train XGBoost model
+- **WHEN** model_type is "xgboost" with hyperparameters n_estimators=500, max_depth=6, learning_rate=0.01
+- **THEN** the system trains an XGBClassifier with those parameters on the training set
+
+#### Scenario: Train RandomForest model
+- **WHEN** model_type is "random_forest"
+- **THEN** the system trains a RandomForestClassifier with the configured hyperparameters
+
+#### Scenario: Unsupported model type
+- **WHEN** model_type is a value not supported in v1 (e.g., "lstm", "transformer")
+- **THEN** the system SHALL fail with an error message listing the supported model types
+
+### Requirement: MLflow experiment tracking
+The system SHALL log all training runs to MLflow. Each run SHALL log: the full pipeline YAML config as an artifact, dataset version (DVC hash if available), total samples, number of classes, model type, window size, per-class sample counts, and all hyperparameters.
+
+#### Scenario: Log training run
+- **WHEN** a training run starts
+- **THEN** the system creates an MLflow run under the experiment name from `stages.training.mlflow.experiment_name` and logs all parameters
+
+#### Scenario: MLflow server unavailable
+- **WHEN** the MLflow tracking URI is unreachable
+- **THEN** the system SHALL fail with an error message indicating the MLflow server cannot be reached at the configured URI
+
+### Requirement: Training metrics logging
+After training, the system SHALL evaluate the model on the test set and log metrics to MLflow: overall accuracy, macro F1, weighted F1, and per-class precision, recall, and F1 for each label.
+
+#### Scenario: Log overall metrics
+- **WHEN** model evaluation completes
+- **THEN** the system logs accuracy, f1_macro, and f1_weighted to MLflow
+
+#### Scenario: Log per-class metrics
+- **WHEN** model evaluation completes with labels "bull_flag", "bear_flag", and "O"
+- **THEN** the system logs precision_bull_flag, recall_bull_flag, f1_bull_flag (and same for each other label) to MLflow
+
+### Requirement: Training artifact logging
+When `stages.training.mlflow.log_artifacts` is true, the system SHALL log to MLflow: a confusion matrix plot (PNG), a feature importance plot (PNG, for tree-based models), and a classification report (text).
+
+#### Scenario: Log confusion matrix
+- **WHEN** log_artifacts is true and training completes
+- **THEN** the system generates and logs a confusion matrix plot as "confusion_matrix.png" to MLflow
+
+#### Scenario: Log feature importance
+- **WHEN** log_artifacts is true and the model has `feature_importances_` attribute
+- **THEN** the system generates and logs a feature importance plot as "feature_importance.png" to MLflow
+
+### Requirement: Model registration
+When `stages.training.mlflow.register_model` is true, the system SHALL register the trained model in the MLflow model registry under the name specified by `stages.inference.mlflow_model_name`.
+
+#### Scenario: Register model
+- **WHEN** register_model is true and training completes
+- **THEN** the system registers the model in MLflow registry with the configured model name
+
+### Requirement: PostgreSQL training metadata storage
+The system SHALL store training run metadata in the PostgreSQL database. Each training run record SHALL include: run_id (MLflow run ID), model_type, experiment_name, pipeline_config_hash, dataset_version, metrics summary (JSON), status, and timestamps (created_at, completed_at).
+
+#### Scenario: Store training run record
+- **WHEN** a training run completes successfully
+- **THEN** the system inserts a record into the PostgreSQL `training_runs` table with the run metadata
+
+#### Scenario: Query training history
+- **WHEN** the system queries training runs
+- **THEN** it returns records from PostgreSQL ordered by created_at descending
+
+### Requirement: Pipeline config logging
+The system SHALL log the full pipeline YAML config as an MLflow artifact with each training run. This config SHALL be used during inference to replicate the exact preprocessing steps.
+
+#### Scenario: Config artifact logged
+- **WHEN** a training run starts
+- **THEN** the full pipeline.yaml content is logged as "pipeline_config.yaml" artifact in the MLflow run
--- a/openspec/specs/prediction-ui/spec.md
+++ b/openspec/specs/prediction-ui/spec.md
@ -0,0 +1,130 @@
+## ADDED Requirements
+
+### Requirement: Prediction state management
+The system SHALL maintain a separate prediction state alongside the existing annotation state. The prediction state SHALL include: spans (array of prediction spans), isLoading, error, modelInfo, visible (toggle), confidenceThreshold (filter), selectedLabels (filter), and autoPredict (toggle). Prediction state SHALL be independent from annotation state.
+
+#### Scenario: Initial prediction state
+- **WHEN** the app loads
+- **THEN** predictions are empty, visible is true, confidenceThreshold defaults to 0.70, autoPredict is false, and selectedLabels includes all labels
+
+### Requirement: On-demand prediction fetching
+The system SHALL fetch predictions on demand when the user clicks "Run on Visible". The system SHALL send the currently visible candles to `/api/predict` and update the prediction state with results. Predictions are ephemeral — not persisted, re-fetched on demand.
+
+#### Scenario: Run on visible candles
+- **WHEN** user clicks "Run on Visible" button
+- **THEN** the system sends the visible candle range to /api/predict, shows a loading state, and renders returned predictions on the chart
+
+#### Scenario: Batch predict all
+- **WHEN** user clicks "Predict All" button
+- **THEN** the system sends a batch request to /api/predict/batch for the full dataset and renders all returned predictions
+
+### Requirement: Prediction caching
+The system SHALL cache predictions in memory keyed by `${pair}_${timeframe}_${startTime}_${endTime}_${modelVersion}`. When the user scrolls to a range with cached predictions, the system SHALL use the cache instead of re-fetching. Cache SHALL be invalidated when the model version changes.
+
+#### Scenario: Cache hit
+- **WHEN** user scrolls back to a previously predicted range with the same model version
+- **THEN** the system renders cached predictions without making an API call
+
+#### Scenario: Cache invalidation on model change
+- **WHEN** the model version changes (detected via /api/model/info)
+- **THEN** all cached predictions are cleared
+
+### Requirement: Prediction rendering on chart
+The system SHALL render model predictions as a visual layer on the lightweight-charts instance, visually distinct from human annotations. Predictions SHALL use a histogram series with per-bar colors mapped to predicted pattern labels at reduced opacity (10-20%). Series markers SHALL be added at the start of each prediction span showing `{label} ({confidence}%)` positioned below bars.
+
+#### Scenario: Render prediction spans
+- **WHEN** predictions are loaded and visible is true
+- **THEN** colored histogram bars appear behind candles for predicted patterns, with markers showing labels and confidence
+
+#### Scenario: Predictions hidden
+- **WHEN** the user toggles predictions off (visible = false)
+- **THEN** the prediction histogram series and markers are removed from the chart
+
+#### Scenario: Visual distinction from annotations
+- **WHEN** both human annotations and model predictions exist for the same range
+- **THEN** human annotations render as solid colored rectangles (above bars) and predictions render as low-opacity histogram bars (below bars) — they are visually distinguishable
+
+### Requirement: Confidence threshold filter
+The system SHALL filter displayed predictions by confidence. Only predictions with confidence >= `confidenceThreshold` SHALL be rendered. The threshold is adjustable via a slider in the controls panel (range 0.0 to 1.0).
+
+#### Scenario: Filter low confidence
+- **WHEN** confidenceThreshold is 0.70 and a prediction has confidence 0.55
+- **THEN** that prediction is not rendered on the chart
+
+#### Scenario: Adjust threshold
+- **WHEN** user moves the confidence slider from 0.70 to 0.50
+- **THEN** previously hidden predictions with confidence between 0.50 and 0.70 become visible
+
+### Requirement: Label type filter
+The system SHALL allow users to toggle visibility of individual pattern labels via checkboxes in the controls panel. Only predictions for checked labels are rendered.
+
+#### Scenario: Hide specific label
+- **WHEN** user unchecks "double_bottom" in the label filter
+- **THEN** all "double_bottom" predictions are hidden from the chart
+
+### Requirement: Prediction controls panel
+The system SHALL display a prediction controls panel in the sidebar with: master on/off toggle, model info (name, version, type, training date), action buttons ("Run on Visible", "Predict All"), auto-predict toggle, confidence threshold slider, label checkboxes with per-class precision/recall metrics, prediction count, agreement count, and a "Show only disagreements" filter.
+
+#### Scenario: Display model info
+- **WHEN** the prediction panel loads and the inference API is available
+- **THEN** the panel fetches /api/model/info and displays model name, version, type, and training date
+
+#### Scenario: Inference API unavailable
+- **WHEN** the prediction panel loads and /api/model/info returns an error
+- **THEN** the panel shows "Model server offline — predictions unavailable" and all controls are disabled
+
+#### Scenario: Per-class metrics display
+- **WHEN** model info includes per-class metrics
+- **THEN** each label checkbox shows precision and recall values (e.g., "bull_flag (P:0.89 R:0.76)")
+
+### Requirement: Disagreement detection
+The system SHALL compare human annotation spans with model prediction spans to identify disagreements. For each human annotation, check if any prediction span overlaps (>50% time overlap). Disagreement types: "missed_by_model" (human annotated, model predicted "O"), "missed_by_human" (model predicted pattern, no human annotation), "label_mismatch" (both see a pattern but different labels).
+
+#### Scenario: Missed by model
+- **WHEN** a human annotation exists at T10-T20 but no prediction span overlaps it
+- **THEN** the system identifies this as "missed_by_model"
+
+#### Scenario: Missed by human
+- **WHEN** a prediction span exists at T30-T40 with no overlapping human annotation
+- **THEN** the system identifies this as "missed_by_human"
+
+#### Scenario: Label mismatch
+- **WHEN** a human annotation labels T10-T20 as "bull_flag" and the prediction labels the same range as "wedge_up"
+- **THEN** the system identifies this as "label_mismatch"
+
+### Requirement: Disagreement rendering
+The system SHALL render disagreements with distinct visual styles: "missed_by_model" shows a red dashed border around the human annotation, "missed_by_human" shows a yellow highlight around the prediction, "label_mismatch" shows an orange border with both labels displayed.
+
+#### Scenario: Render missed_by_human highlight
+- **WHEN** a "missed_by_human" disagreement is detected and disagreement rendering is enabled
+- **THEN** the prediction span is highlighted with a yellow border/glow to draw attention
+
+#### Scenario: Show only disagreements
+- **WHEN** user clicks "Show only disagreements" filter
+- **THEN** only prediction spans involved in disagreements are rendered, hiding agreement spans
+
+### Requirement: Prediction-to-annotation feedback
+When a user clicks on a "missed_by_human" prediction, the system SHALL open the span annotation dialog pre-filled with the prediction's start_time, end_time, and label. The user can confirm (save as new annotation), correct (change label, then save), or dismiss.
+
+#### Scenario: Confirm prediction as annotation
+- **WHEN** user clicks a "missed_by_human" prediction and clicks Save in the pre-filled dialog
+- **THEN** the system creates a new span annotation with the model's suggested label and timestamps
+
+#### Scenario: Correct and save
+- **WHEN** user clicks a "missed_by_human" prediction, changes the label in the dialog, and clicks Save
+- **THEN** the system creates a new span annotation with the corrected label
+
+#### Scenario: Dismiss as not-a-pattern
+- **WHEN** user clicks a "missed_by_human" prediction and clicks "Not a pattern"
+- **THEN** the system saves a negative annotation with label "O", source "human_correction", and records the model's original prediction and confidence
+
+### Requirement: Inference API connection monitoring
+The system SHALL poll `/api/model/info` every 30 seconds when the inference API is unavailable. When the API becomes available, the system SHALL auto-reconnect and enable prediction controls. Human annotation SHALL never be blocked by inference API availability.
+
+#### Scenario: Auto-reconnect
+- **WHEN** the inference API was unavailable and becomes reachable
+- **THEN** the prediction panel re-enables controls and shows "Model server online"
+
+#### Scenario: Annotation independence
+- **WHEN** the inference API is unavailable
+- **THEN** all human annotation tools continue to work normally
--- a/openspec/specs/span-annotation/spec.md
+++ b/openspec/specs/span-annotation/spec.md
@ -1,182 +1,34 @@
 ## ADDED Requirements

-### Requirement: Span annotation data model
-The system SHALL store span annotations in a `span_annotations` database table with columns: `id` (integer primary key, auto-increment), `chart_id` (integer, FK to charts.id), `start_time` (integer, Unix timestamp of first candle), `end_time` (integer, Unix timestamp of last candle), `label` (text, pattern name referencing span_label_types.name), `confidence` (integer, nullable, 1-5 scale), `outcome` (text, nullable, one of 'win'|'loss'|'breakeven'), `notes` (text, nullable), `sub_spans` (text, nullable, JSON array), `color` (text, default '#2196F3'), `created_at` (integer, Unix timestamp).
+### Requirement: Span annotation JSON export for ML pipeline
+The system SHALL provide a `GET /api/span-annotations/export` endpoint that exports all span annotations for a given chart as JSON in the format expected by the ML pipeline. The output SHALL be a JSON object with an `annotations` array where each entry has: `id`, `start_time` (Unix timestamp), `end_time` (Unix timestamp), `label`, `confidence` (nullable), `outcome` (nullable), and `sub_spans` (nullable). The endpoint SHALL accept an optional `chartId` query parameter.

-#### Scenario: Schema structure
- **WHEN** the database is initialized or migrated
- **THEN** the `span_annotations` table exists with all required columns and the foreign key constraint on `chart_id`
+#### Scenario: Export span annotations as JSON
+- **WHEN** GET /api/span-annotations/export?chartId=3 is called
+- **THEN** the system returns a JSON object with all span annotations for chart 3 in the ML pipeline format

-#### Scenario: start_time before end_time
- **WHEN** a span annotation is created where the user clicked the later candle first
- **THEN** the system SHALL swap the values so `start_time` is always less than or equal to `end_time`
+#### Scenario: Export without chartId
+- **WHEN** GET /api/span-annotations/export is called without chartId
+- **THEN** the system exports span annotations for the most recently created chart

-### Requirement: Two-click span creation
-When the "span" tool is active, the system SHALL implement a two-click interaction to define a span. The first click sets the start candle. The second click sets the end candle. Both clicks SHALL snap to the nearest candle timestamp from the loaded candle data using `chart.timeScale().coordinateToTime()` followed by finding the nearest candle in the data array.
+### Requirement: Prediction-sourced span annotation creation
+The system SHALL support creating span annotations with a `source` field indicating whether the annotation was created by a human ("human"), confirmed from a model prediction ("model_confirmed"), or corrected from a model prediction ("model_corrected"). The existing POST endpoint for span annotations SHALL accept an optional `source` field (default: "human") and optional `model_prediction` field (object with `label` and `confidence` from the original prediction).

-#### Scenario: First click sets start candle
- **WHEN** "span" tool is active and user clicks on the chart
- **THEN** the system identifies the nearest candle to the click position and highlights it as the span start point
+#### Scenario: Create human annotation
+- **WHEN** a span annotation is created without a source field
+- **THEN** the source defaults to "human"

-#### Scenario: Second click completes span selection
- **WHEN** "span" tool is active and user has already clicked a start candle, then clicks a second position
- **THEN** the system identifies the nearest candle to the second click and defines the span range from start to end candle
+#### Scenario: Confirm model prediction
+- **WHEN** a user confirms a model prediction as an annotation
+- **THEN** the span annotation is created with source "model_confirmed" and model_prediction containing the original predicted label and confidence

-#### Scenario: Clicking same candle twice
- **WHEN** user clicks the same candle for both start and end
- **THEN** the system creates a single-candle span (start_time equals end_time)
+#### Scenario: Correct model prediction
+- **WHEN** a user changes the label of a model prediction before saving
+- **THEN** the span annotation is created with source "model_corrected" and model_prediction containing the original predicted label and confidence

-#### Scenario: Clicking end before start
- **WHEN** user clicks a candle that is earlier in time than the start candle
- **THEN** the system SHALL swap start and end so the span is always in chronological order
+### Requirement: Negative annotation for dismissed predictions
+The system SHALL support saving negative annotations when a user dismisses a model prediction as "not a pattern". A negative annotation SHALL have label "O", source "human_correction", and a `model_prediction` field recording what the model originally predicted.

-### Requirement: Live preview rectangle during span selection
-After the first click, the system SHALL display a preview rectangle that stretches from the start candle to the candle currently under the cursor. The preview rectangle height SHALL cover the price range (min low to max high) of all candles within the preview span. The preview SHALL update as the mouse moves and SHALL snap to candle positions.
-
-#### Scenario: Preview follows cursor
- **WHEN** user has clicked the first candle and moves the mouse
- **THEN** a semi-transparent preview rectangle renders from the start candle to the candle nearest to the cursor, with height spanning the high/low range of candles in that range
-
-#### Scenario: Preview updates on mouse move
- **WHEN** cursor moves to a different candle position
- **THEN** the preview rectangle resizes to include the new candle range with updated price bounds
-
-#### Scenario: Preview disappears on cancel
- **WHEN** user presses Escape during span selection (after first click)
- **THEN** the preview rectangle disappears and the span selection is cancelled without saving
-
-### Requirement: Label assignment popover
-After the second click completes the span selection, the system SHALL display a popover near the click position containing: a label dropdown populated from `span_label_types`, an optional confidence slider (1-5), an optional outcome select (win/loss/breakeven/none), an optional notes textarea, and Save/Cancel buttons.
-
-#### Scenario: Popover appears after span selection
- **WHEN** user completes the two-click span selection
- **THEN** a popover appears near the end-click position with label dropdown, confidence, outcome, notes fields, and Save/Cancel buttons
-
-#### Scenario: Save span annotation
- **WHEN** user selects a label and clicks Save in the popover
- **THEN** the system sends a POST request to create the span annotation, renders the span rectangle on the chart, updates the sidebar list, and closes the popover
-
-#### Scenario: Cancel span annotation
- **WHEN** user clicks Cancel in the popover or presses Escape
- **THEN** the popover closes, the preview rectangle disappears, and no annotation is saved
-
-#### Scenario: Label is required
- **WHEN** user clicks Save without selecting a label
- **THEN** the Save button SHALL be disabled or the system SHALL show a validation message requiring a label
-
-### Requirement: Span rectangle rendering via ISeriesPrimitive
-The system SHALL render saved span annotations as semi-transparent colored rectangles on the chart using the lightweight-charts `ISeriesPrimitive` API. Each span annotation SHALL have one `SpanRectanglePrimitive` instance attached to the candlestick series via `series.attachPrimitive()`. The rectangle SHALL be defined by data coordinates: top-left `{time: start_time, price: max_high}` and bottom-right `{time: end_time, price: min_low}` where max_high and min_low are the highest high and lowest low of candles within the span range.
-
-#### Scenario: Rectangle renders for saved span
- **WHEN** a span annotation exists for the active chart
- **THEN** a semi-transparent colored rectangle renders behind the candles in the span range, with color from the span's label type configuration
-
-#### Scenario: Rectangle z-order
- **WHEN** span rectangles render on the chart
- **THEN** they SHALL render behind (below) the candlestick series so candles remain fully visible on top
-
-#### Scenario: Rectangle updates on zoom/pan
- **WHEN** user zooms or pans the chart
- **THEN** span rectangles automatically reposition and resize to stay aligned with their candle range (handled natively by the ISeriesPrimitive lifecycle)
-
-#### Scenario: Label tag display
- **WHEN** a span rectangle renders on the chart
- **THEN** a small label tag showing the pattern name SHALL be displayed above the rectangle
-
-#### Scenario: Rectangle color from label config
- **WHEN** a span annotation has label "bull_flag" and the bull_flag label type has color "#4CAF50"
- **THEN** the rectangle renders with fill color "#4CAF50" at reduced opacity (semi-transparent)
-
-### Requirement: Span selection on chart
-The system SHALL allow users to select an existing span annotation by clicking within its rectangle on the chart.
-
-#### Scenario: Click to select span
- **WHEN** user clicks within a span rectangle on the chart (detected via `hitTest()`)
- **THEN** the system sets the selectedSpanId state, highlights the span rectangle (e.g., thicker border or increased opacity), and scrolls to it in the sidebar list
-
-#### Scenario: Click to deselect span
- **WHEN** user clicks on an already-selected span rectangle
- **THEN** the system deselects it by clearing selectedSpanId and removing the highlight
-
-#### Scenario: Click outside any span
- **WHEN** user clicks on the chart outside any span rectangle while a span is selected
- **THEN** the system deselects the currently selected span
-
-### Requirement: Span editing
-The system SHALL allow users to edit an existing span annotation's metadata (label, confidence, outcome, notes).
-
-#### Scenario: Open edit popover
- **WHEN** user double-clicks on an existing span rectangle or selects it and presses Enter
- **THEN** a popover appears pre-populated with the span's current label, confidence, outcome, and notes
-
-#### Scenario: Save edited span
- **WHEN** user modifies fields in the edit popover and clicks Save
- **THEN** the system sends a PATCH request to update the span annotation, re-renders the rectangle with any color changes, and updates the sidebar list
-
-#### Scenario: Cancel edit
- **WHEN** user clicks Cancel or presses Escape in the edit popover
- **THEN** changes are discarded and the popover closes
-
-### Requirement: Span deletion
-The system SHALL allow users to delete span annotations via keyboard shortcut, sidebar button, or the delete tool.
-
-#### Scenario: Delete selected span with keyboard
- **WHEN** a span is selected (selectedSpanId is set) and user presses Delete or Backspace
- **THEN** the system sends a DELETE request, removes the rectangle primitive from the chart, clears the selection, and updates the sidebar list
-
-#### Scenario: Delete span from sidebar
- **WHEN** user clicks the trash icon on a span entry in the sidebar list
- **THEN** the system sends a DELETE request, removes the rectangle from the chart, and updates the sidebar list
-
-#### Scenario: Delete span with delete tool
- **WHEN** the "delete" tool is active and user clicks within a span rectangle
- **THEN** the system deletes that span annotation via API, removes the rectangle, and updates the sidebar
-
-### Requirement: Span annotation sidebar list
-The system SHALL display a "Span Annotations" section in the sidebar showing all span annotations for the active chart.
-
-#### Scenario: Display span list
- **WHEN** span annotations exist for the active chart
- **THEN** the sidebar displays a scrollable list of span annotations sorted by start_time (newest first), with each entry showing: start/end time range, label name with colored badge, and delete button
-
-#### Scenario: Empty span list
- **WHEN** no span annotations exist for the active chart
- **THEN** the section displays "No span annotations yet. Use the Span tool to select candle ranges."
-
-#### Scenario: Click span in list to select
- **WHEN** user clicks a span entry in the sidebar list
- **THEN** the system sets selectedSpanId, highlights the corresponding rectangle on the chart, and scrolls the chart to center on the span's time range
-
-#### Scenario: Selected span highlight in list
- **WHEN** a span is selected
- **THEN** the corresponding entry in the sidebar list has a highlighted background and border
-
-#### Scenario: Span list updates on chart switch
- **WHEN** user switches to a different chart
- **THEN** the span list clears and reloads with span annotations belonging to the newly selected chart
-
-### Requirement: Span count display
-The sidebar SHALL display a count summary of span annotations grouped by label type.
-
-#### Scenario: Display span counts
- **WHEN** span annotations exist for the active chart
- **THEN** the sidebar displays counts per label type (e.g., "Bull Flag: 3 | Bear Flag: 2")
-
-#### Scenario: Counts update after changes
- **WHEN** user adds or deletes a span annotation
- **THEN** the count summary updates immediately
-
-### Requirement: Hotkey label assignment
-When the span tool is active and the user has completed a two-click selection, pressing a configured hotkey SHALL assign the corresponding label and save the span immediately without showing the popover.
-
-#### Scenario: Hotkey assigns label after span selection
- **WHEN** span tool is active, user has selected start and end candles, and presses hotkey "1" (mapped to "bull_flag")
- **THEN** the system saves the span with label "bull_flag", default confidence/outcome/notes, renders the rectangle, and updates the sidebar
-
-#### Scenario: Hotkey ignored when no span selected
- **WHEN** span tool is active but no span range has been selected yet, and user presses a label hotkey
- **THEN** the system takes no action
-
-#### Scenario: Hotkey ignored when span tool is not active
- **WHEN** a different tool (e.g., break_up, line) is active and user presses a span label hotkey
- **THEN** the system takes no action (hotkeys only work when span tool is active)
+#### Scenario: Save negative annotation
+- **WHEN** user dismisses a "bull_flag" prediction with confidence 0.72
+- **THEN** the system creates a span annotation with label "O", source "human_correction", and model_prediction `{ "label": "bull_flag", "confidence": 0.72 }`