## MODIFIED Requirements ### Requirement: PostgreSQL training metadata storage The system SHALL store training run metadata in the PostgreSQL database. Each training run record SHALL include: run_id (MLflow run ID), model_type, experiment_name, pipeline_config_hash, dataset_version, metrics summary (JSON), status, and timestamps (created_at, completed_at). #### Scenario: Store training run record - **WHEN** a training run completes successfully - **THEN** the system inserts a record into the PostgreSQL `training_runs` table with the run metadata #### Scenario: Query training history - **WHEN** the system queries training runs - **THEN** it returns records from PostgreSQL ordered by created_at descending #### Scenario: Database name updated - **WHEN** the ML service connects to PostgreSQL - **THEN** it connects to the `candle_annotator` database (not `ml_db`) ## ADDED Requirements ### Requirement: Direct annotation data access The ML service SHALL read candle and annotation data directly from PostgreSQL instead of requiring CSV/JSON file exports. The ML service SHALL query the `candles`, `annotations`, `span_annotations`, and `charts` tables for training data. #### Scenario: Query candle data for training - **WHEN** the ML training pipeline needs OHLC data for a chart - **THEN** it queries the `candles` table in PostgreSQL filtered by `chart_id`, ordered by `time` #### Scenario: Query span annotations for labels - **WHEN** the ML training pipeline needs labeled spans for training - **THEN** it queries the `span_annotations` table in PostgreSQL filtered by `chart_id` and optionally by `source` #### Scenario: No CSV/JSON export required - **WHEN** the ML training pipeline starts - **THEN** it does not require pre-exported CSV or JSON files — all data is read from PostgreSQL #### Scenario: Shared database connection - **WHEN** the ML service reads candle/annotation data - **THEN** it uses the same PostgreSQL connection (same database, same credentials) as for `training_runs`