chore: archive ml-db-consolidation change and sync specs

- Archived change to openspec/changes/archive/2026-02-17-ml-db-consolidation/
- Created new postgres-data-layer spec with PostgreSQL connection, schema definitions, Drizzle migrations, npm deps, and SQLite migration requirements
- Updated docker-deployment spec: Docker Compose now PostgreSQL-based (postgres dependency, ml-data volume, DATABASE_URL); env vars updated (DATABASE_URL added, DATABASE_PATH removed); database persistence updated to PostgreSQL volumes; health check updated to PostgreSQL
- Updated ml-training spec: added database name scenario (candle_annotator) and new direct annotation data access requirement

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Marko Djordjevic 2026-02-17 18:22:28 +01:00
parent 0e8dcc6707
commit 38df874255
10 changed files with 532 additions and 31 deletions

View file

@ -32,27 +32,31 @@ The project SHALL include docker-compose.yml for simplified deployment orchestra
#### Scenario: Service definition
- **WHEN** docker-compose.yml is parsed
- **THEN** defines single service named 'candle-annotator' using Dockerfile from current directory
- **THEN** defines service named 'candle-annotator' using Dockerfile from current directory
#### Scenario: Port mapping
- **WHEN** docker-compose up runs
- **THEN** maps host port 3000 to container port 3000 (configurable via PORT environment variable)
- **THEN** maps host port 3000 to container port 3000
#### Scenario: Volume mounting for database
#### Scenario: Volume mounting for ML data
- **WHEN** docker-compose up runs
- **THEN** mounts named volume 'candle-data' to /app/data in container for SQLite database persistence
- **THEN** mounts named volume 'ml-data' to /app/ml-data in the candle-annotator container
#### Scenario: Environment variable configuration
- **WHEN** docker-compose.yml is used
- **THEN** supports loading environment variables from .env file for NODE_ENV, PORT, and other configs
#### Scenario: Frontend depends on PostgreSQL
- **WHEN** docker-compose up runs
- **THEN** the candle-annotator service starts only after the postgres service is healthy (`depends_on: postgres: condition: service_healthy`)
#### Scenario: Frontend DATABASE_URL
- **WHEN** the candle-annotator service starts
- **THEN** the `DATABASE_URL` environment variable is set to `postgresql://ml_user:ml_password@postgres:5432/candle_annotator`
#### Scenario: Restart policy
- **WHEN** container crashes or stops
- **THEN** docker-compose automatically restarts container unless explicitly stopped (restart: unless-stopped)
#### Scenario: Volume declaration
#### Scenario: No SQLite volume
- **WHEN** docker-compose.yml is parsed
- **THEN** declares 'candle-data' as named volume in volumes section
- **THEN** there is no `candle-data` volume defined or mounted
### Requirement: Environment variable configuration
The project SHALL use environment variables for runtime configuration.
@ -61,6 +65,14 @@ The project SHALL use environment variables for runtime configuration.
- **WHEN** repository is cloned
- **THEN** includes .env.example file documenting all configurable environment variables with example values
#### Scenario: DATABASE_URL configuration
- **WHEN** `DATABASE_URL` environment variable is set
- **THEN** the Next.js application connects to the PostgreSQL database at the specified URL
#### Scenario: No DATABASE_PATH variable
- **WHEN** environment variables are inspected
- **THEN** there is no `DATABASE_PATH` variable (SQLite path is removed)
#### Scenario: PORT configuration
- **WHEN** PORT environment variable is set
- **THEN** Next.js server listens on specified port (default: 3000)
@ -69,14 +81,6 @@ The project SHALL use environment variables for runtime configuration.
- **WHEN** NODE_ENV environment variable is set to 'production'
- **THEN** Next.js runs in production mode with optimizations enabled
#### Scenario: Database path configuration
- **WHEN** DATABASE_PATH environment variable is set (optional)
- **THEN** SQLite database file is created at specified path (default: ./data/candles.db)
#### Scenario: HOSTNAME configuration
- **WHEN** HOSTNAME environment variable is set
- **THEN** Next.js server binds to specified hostname (default: 0.0.0.0 for containers)
### Requirement: Health check endpoint
The API SHALL provide a health check endpoint for container orchestration.
@ -86,7 +90,7 @@ The API SHALL provide a health check endpoint for container orchestration.
#### Scenario: Database connection check
- **WHEN** GET request sent to `/api/health?check=db`
- **THEN** system attempts simple database query and returns 200 if successful, 503 if database unavailable
- **THEN** system attempts a PostgreSQL query and returns 200 if successful, 503 if database unavailable
#### Scenario: Health check in Dockerfile
- **WHEN** Dockerfile defines HEALTHCHECK
@ -138,23 +142,19 @@ The Docker image SHALL be optimized for production use with minimal size.
- **THEN** final image size is under 200MB (excluding data volume)
### Requirement: Database persistence
The deployment SHALL ensure SQLite database persists across container restarts.
The deployment SHALL ensure PostgreSQL data persists across container restarts.
#### Scenario: Volume mounting
- **WHEN** container runs with volume mount
- **THEN** /app/data directory is mounted from host or Docker volume
#### Scenario: Database file location
- **WHEN** application initializes database
- **THEN** SQLite file is created at /app/data/candles.db (inside mounted volume)
#### Scenario: PostgreSQL volume
- **WHEN** docker-compose up runs
- **THEN** the `postgres-data` named volume is mounted to `/var/lib/postgresql/data` in the postgres container
#### Scenario: Container restart preserves data
- **WHEN** container is stopped and restarted
- **THEN** existing database file is reused and all annotations and candles remain intact
- **WHEN** the postgres container is stopped and restarted
- **THEN** all database tables and data remain intact
#### Scenario: File permissions
- **WHEN** container creates database file
- **THEN** appuser has read/write permissions to /app/data directory
#### Scenario: PostgreSQL database name
- **WHEN** the postgres service starts
- **THEN** the `POSTGRES_DB` environment variable is set to `candle_annotator`
### Requirement: Deployment documentation
DEPLOYMENT.md SHALL include comprehensive Docker deployment instructions.

View file

@ -84,6 +84,29 @@ The system SHALL store training run metadata in the PostgreSQL database. Each tr
- **WHEN** the system queries training runs
- **THEN** it returns records from PostgreSQL ordered by created_at descending
#### Scenario: Database name updated
- **WHEN** the ML service connects to PostgreSQL
- **THEN** it connects to the `candle_annotator` database (not `ml_db`)
### Requirement: Direct annotation data access
The ML service SHALL read candle and annotation data directly from PostgreSQL instead of requiring CSV/JSON file exports. The ML service SHALL query the `candles`, `annotations`, `span_annotations`, and `charts` tables for training data.
#### Scenario: Query candle data for training
- **WHEN** the ML training pipeline needs OHLC data for a chart
- **THEN** it queries the `candles` table in PostgreSQL filtered by `chart_id`, ordered by `time`
#### Scenario: Query span annotations for labels
- **WHEN** the ML training pipeline needs labeled spans for training
- **THEN** it queries the `span_annotations` table in PostgreSQL filtered by `chart_id` and optionally by `source`
#### Scenario: No CSV/JSON export required
- **WHEN** the ML training pipeline starts
- **THEN** it does not require pre-exported CSV or JSON files — all data is read from PostgreSQL
#### Scenario: Shared database connection
- **WHEN** the ML service reads candle/annotation data
- **THEN** it uses the same PostgreSQL connection (same database, same credentials) as for `training_runs`
### Requirement: Pipeline config logging
The system SHALL log the full pipeline YAML config as an MLflow artifact with each training run. This config SHALL be used during inference to replicate the exact preprocessing steps.

View file

@ -0,0 +1,80 @@
## ADDED Requirements
### Requirement: PostgreSQL connection via Drizzle ORM
The Next.js application SHALL connect to PostgreSQL using Drizzle ORM with the `node-postgres` (`pg`) driver. The connection SHALL use a pool with a configurable maximum number of connections (default: 10). The connection string SHALL be read from the `DATABASE_URL` environment variable.
#### Scenario: Successful connection
- **WHEN** the application starts with a valid `DATABASE_URL` pointing to a running PostgreSQL instance
- **THEN** Drizzle ORM establishes a connection pool and the `db` export is ready for queries
#### Scenario: Missing DATABASE_URL
- **WHEN** the `DATABASE_URL` environment variable is not set
- **THEN** the application SHALL fail to start with an error message indicating the missing variable
#### Scenario: Database unreachable
- **WHEN** the PostgreSQL instance is not reachable at the configured URL
- **THEN** the application SHALL fail to start with a connection error
### Requirement: PostgreSQL schema definitions
The Drizzle schema SHALL define all frontend tables using `pgTable` from `drizzle-orm/pg-core`. The following tables SHALL be defined: `charts`, `candles`, `annotation_types`, `annotations`, `span_label_types`, `span_annotations`.
#### Scenario: Charts table schema
- **WHEN** the schema is loaded
- **THEN** the `charts` table has columns: `id` (serial, primary key), `name` (text, unique, not null), `created_at` (timestamp, not null, default now)
#### Scenario: Candles table schema
- **WHEN** the schema is loaded
- **THEN** the `candles` table has columns: `id` (serial, primary key), `chart_id` (integer, foreign key to charts.id, not null), `time` (timestamp, not null), `open` (double precision, not null), `high` (double precision, not null), `low` (double precision, not null), `close` (double precision, not null), with a unique index on `(chart_id, time)`
#### Scenario: Annotation types table schema
- **WHEN** the schema is loaded
- **THEN** the `annotation_types` table has columns: `id` (serial, primary key), `name` (text, unique, not null), `display_name` (text, not null), `color` (text, not null), `category` (text, not null), `icon` (text, nullable), `is_active` (boolean, not null, default true), `created_at` (timestamp, not null, default now)
#### Scenario: Annotations table schema
- **WHEN** the schema is loaded
- **THEN** the `annotations` table has columns: `id` (serial, primary key), `chart_id` (integer, foreign key to charts.id, not null), `timestamp` (timestamp, not null), `label_type` (text, not null), `geometry` (jsonb, nullable), `color` (text, default '#3b82f6'), `created_at` (timestamp, not null, default now)
#### Scenario: Span label types table schema
- **WHEN** the schema is loaded
- **THEN** the `span_label_types` table has columns: `id` (serial, primary key), `name` (text, unique, not null), `display_name` (text, not null), `color` (text, not null), `hotkey` (text, nullable), `is_active` (boolean, not null, default true), `sort_order` (integer, not null, default 0), `created_at` (timestamp, not null, default now)
#### Scenario: Span annotations table schema
- **WHEN** the schema is loaded
- **THEN** the `span_annotations` table has columns: `id` (serial, primary key), `chart_id` (integer, foreign key to charts.id, not null), `start_time` (timestamp, not null), `end_time` (timestamp, not null), `label` (text, not null), `confidence` (integer, nullable), `outcome` (text, nullable), `notes` (text, nullable), `sub_spans` (jsonb, nullable), `color` (text, not null, default '#2196F3'), `source` (text, not null, default 'human'), `model_prediction` (jsonb, nullable), `created_at` (timestamp, not null, default now)
### Requirement: PostgreSQL migrations via Drizzle Kit
The project SHALL use Drizzle Kit to generate and apply PostgreSQL migrations. The `drizzle.config.ts` SHALL target the `postgresql` dialect. Existing SQLite migrations SHALL be removed.
#### Scenario: Generate migrations
- **WHEN** `drizzle-kit generate` is executed
- **THEN** a new SQL migration file is created in the `drizzle/` directory with PostgreSQL-dialect DDL
#### Scenario: Apply migrations at startup
- **WHEN** the application starts (not during build phase)
- **THEN** Drizzle runs pending migrations against the PostgreSQL database
#### Scenario: Skip migrations during build
- **WHEN** `NEXT_PHASE` is `phase-production-build` or `phase-development-build`
- **THEN** migration execution is skipped
### Requirement: npm dependency changes
The project SHALL remove `better-sqlite3` and `@types/better-sqlite3` from dependencies and add `pg` and `@types/pg`.
#### Scenario: Dependencies updated
- **WHEN** `package.json` is inspected
- **THEN** `better-sqlite3` and `@types/better-sqlite3` are absent, and `pg` and `@types/pg` are present in dependencies
### Requirement: Data migration from SQLite to PostgreSQL
The project SHALL include a one-time migration script at `scripts/migrate-sqlite-to-postgres.ts` that reads all data from the SQLite database and inserts it into PostgreSQL with appropriate type conversions.
#### Scenario: Migrate all tables
- **WHEN** the migration script is executed with both databases accessible
- **THEN** all rows from charts, candles, annotation_types, annotations, span_label_types, and span_annotations are transferred to PostgreSQL
#### Scenario: Type conversions applied
- **WHEN** data is migrated
- **THEN** SQLite integer timestamps are converted to PostgreSQL timestamps, integer booleans (0/1) are converted to PostgreSQL booleans, and text JSON fields are inserted as jsonb
#### Scenario: Idempotent execution
- **WHEN** the migration script is run a second time on an already-migrated database
- **THEN** the script either skips existing data or clears and re-inserts (with a flag), without creating duplicates