candle-annotator/openspec/changes/ml-db-consolidation/specs/ml-training/spec.md at 5f70f13da3090870f253b6ea22119b65a9a5a837

Marko Djordjevic 5f70f13da3 feat: migrate from SQLite to PostgreSQL - complete schema and API updates

- Remove better-sqlite3, add pg driver
- Convert schema to PostgreSQL types (serial, timestamp, boolean, jsonb)
- Generate fresh PostgreSQL migrations
- Update database connection layer with pg.Pool
- Fix all API routes: remove JSON.parse/stringify, use native timestamps and booleans
- Update drizzle.config.ts and .env.example for PostgreSQL

2026-02-17 13:43:06 +01:00

2 KiB

Raw Blame History

MODIFIED Requirements

Requirement: PostgreSQL training metadata storage

The system SHALL store training run metadata in the PostgreSQL database. Each training run record SHALL include: run_id (MLflow run ID), model_type, experiment_name, pipeline_config_hash, dataset_version, metrics summary (JSON), status, and timestamps (created_at, completed_at).

Scenario: Store training run record

WHEN a training run completes successfully
THEN the system inserts a record into the PostgreSQL training_runs table with the run metadata

Scenario: Query training history

WHEN the system queries training runs
THEN it returns records from PostgreSQL ordered by created_at descending

Scenario: Database name updated

WHEN the ML service connects to PostgreSQL
THEN it connects to the candle_annotator database (not ml_db)

ADDED Requirements

Requirement: Direct annotation data access

The ML service SHALL read candle and annotation data directly from PostgreSQL instead of requiring CSV/JSON file exports. The ML service SHALL query the candles, annotations, span_annotations, and charts tables for training data.

Scenario: Query candle data for training

WHEN the ML training pipeline needs OHLC data for a chart
THEN it queries the candles table in PostgreSQL filtered by chart_id, ordered by time

Scenario: Query span annotations for labels

WHEN the ML training pipeline needs labeled spans for training
THEN it queries the span_annotations table in PostgreSQL filtered by chart_id and optionally by source

Scenario: No CSV/JSON export required

WHEN the ML training pipeline starts
THEN it does not require pre-exported CSV or JSON files — all data is read from PostgreSQL

Scenario: Shared database connection

WHEN the ML service reads candle/annotation data
THEN it uses the same PostgreSQL connection (same database, same credentials) as for training_runs

2 KiB Raw Blame History