chore: archive ml-db-consolidation change and sync specs

- Archived change to openspec/changes/archive/2026-02-17-ml-db-consolidation/ - Created new postgres-data-layer spec with PostgreSQL connection, schema definitions, Drizzle migrations, npm deps, and SQLite migration requirements - Updated docker-deployment spec: Docker Compose now PostgreSQL-based (postgres dependency, ml-data volume, DATABASE_URL); env vars updated (DATABASE_URL added, DATABASE_PATH removed); database persistence updated to PostgreSQL volumes; health check updated to PostgreSQL - Updated ml-training spec: added database name scenario (candle_annotator) and new direct annotation data access requirement Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 18:22:28 +01:00 · 2026-02-17 18:22:28 +01:00 · 38df874255
commit 38df874255
parent 0e8dcc6707
10 changed files with 532 additions and 31 deletions
--- a/openspec/changes/archive/2026-02-17-ml-db-consolidation/specs/ml-training/spec.md
+++ b/openspec/changes/archive/2026-02-17-ml-db-consolidation/specs/ml-training/spec.md
@ -0,0 +1,37 @@
+## MODIFIED Requirements
+
+### Requirement: PostgreSQL training metadata storage
+The system SHALL store training run metadata in the PostgreSQL database. Each training run record SHALL include: run_id (MLflow run ID), model_type, experiment_name, pipeline_config_hash, dataset_version, metrics summary (JSON), status, and timestamps (created_at, completed_at).
+
+#### Scenario: Store training run record
+- **WHEN** a training run completes successfully
+- **THEN** the system inserts a record into the PostgreSQL `training_runs` table with the run metadata
+
+#### Scenario: Query training history
+- **WHEN** the system queries training runs
+- **THEN** it returns records from PostgreSQL ordered by created_at descending
+
+#### Scenario: Database name updated
+- **WHEN** the ML service connects to PostgreSQL
+- **THEN** it connects to the `candle_annotator` database (not `ml_db`)
+
+## ADDED Requirements
+
+### Requirement: Direct annotation data access
+The ML service SHALL read candle and annotation data directly from PostgreSQL instead of requiring CSV/JSON file exports. The ML service SHALL query the `candles`, `annotations`, `span_annotations`, and `charts` tables for training data.
+
+#### Scenario: Query candle data for training
+- **WHEN** the ML training pipeline needs OHLC data for a chart
+- **THEN** it queries the `candles` table in PostgreSQL filtered by `chart_id`, ordered by `time`
+
+#### Scenario: Query span annotations for labels
+- **WHEN** the ML training pipeline needs labeled spans for training
+- **THEN** it queries the `span_annotations` table in PostgreSQL filtered by `chart_id` and optionally by `source`
+
+#### Scenario: No CSV/JSON export required
+- **WHEN** the ML training pipeline starts
+- **THEN** it does not require pre-exported CSV or JSON files — all data is read from PostgreSQL
+
+#### Scenario: Shared database connection
+- **WHEN** the ML service reads candle/annotation data
+- **THEN** it uses the same PostgreSQL connection (same database, same credentials) as for `training_runs`