feat: migrate from SQLite to PostgreSQL - complete schema and API updates
- Remove better-sqlite3, add pg driver - Convert schema to PostgreSQL types (serial, timestamp, boolean, jsonb) - Generate fresh PostgreSQL migrations - Update database connection layer with pg.Pool - Fix all API routes: remove JSON.parse/stringify, use native timestamps and booleans - Update drizzle.config.ts and .env.example for PostgreSQL
This commit is contained in:
parent
4605283d2b
commit
5f70f13da3
37 changed files with 1164 additions and 1825 deletions
|
|
@ -0,0 +1,81 @@
|
|||
## MODIFIED Requirements
|
||||
|
||||
### Requirement: Docker Compose configuration
|
||||
The project SHALL include docker-compose.yml for simplified deployment orchestration.
|
||||
|
||||
#### Scenario: Service definition
|
||||
- **WHEN** docker-compose.yml is parsed
|
||||
- **THEN** defines service named 'candle-annotator' using Dockerfile from current directory
|
||||
|
||||
#### Scenario: Port mapping
|
||||
- **WHEN** docker-compose up runs
|
||||
- **THEN** maps host port 3000 to container port 3000
|
||||
|
||||
#### Scenario: Volume mounting for ML data
|
||||
- **WHEN** docker-compose up runs
|
||||
- **THEN** mounts named volume 'ml-data' to /app/ml-data in the candle-annotator container
|
||||
|
||||
#### Scenario: Frontend depends on PostgreSQL
|
||||
- **WHEN** docker-compose up runs
|
||||
- **THEN** the candle-annotator service starts only after the postgres service is healthy (`depends_on: postgres: condition: service_healthy`)
|
||||
|
||||
#### Scenario: Frontend DATABASE_URL
|
||||
- **WHEN** the candle-annotator service starts
|
||||
- **THEN** the `DATABASE_URL` environment variable is set to `postgresql://ml_user:ml_password@postgres:5432/candle_annotator`
|
||||
|
||||
#### Scenario: Restart policy
|
||||
- **WHEN** container crashes or stops
|
||||
- **THEN** docker-compose automatically restarts container unless explicitly stopped (restart: unless-stopped)
|
||||
|
||||
#### Scenario: No SQLite volume
|
||||
- **WHEN** docker-compose.yml is parsed
|
||||
- **THEN** there is no `candle-data` volume defined or mounted
|
||||
|
||||
### Requirement: Environment variable configuration
|
||||
The project SHALL use environment variables for runtime configuration.
|
||||
|
||||
#### Scenario: .env.example file
|
||||
- **WHEN** repository is cloned
|
||||
- **THEN** includes .env.example file documenting all configurable environment variables with example values
|
||||
|
||||
#### Scenario: DATABASE_URL configuration
|
||||
- **WHEN** `DATABASE_URL` environment variable is set
|
||||
- **THEN** the Next.js application connects to the PostgreSQL database at the specified URL
|
||||
|
||||
#### Scenario: No DATABASE_PATH variable
|
||||
- **WHEN** environment variables are inspected
|
||||
- **THEN** there is no `DATABASE_PATH` variable (SQLite path is removed)
|
||||
|
||||
#### Scenario: PORT configuration
|
||||
- **WHEN** PORT environment variable is set
|
||||
- **THEN** Next.js server listens on specified port (default: 3000)
|
||||
|
||||
#### Scenario: NODE_ENV configuration
|
||||
- **WHEN** NODE_ENV environment variable is set to 'production'
|
||||
- **THEN** Next.js runs in production mode with optimizations enabled
|
||||
|
||||
### Requirement: Database persistence
|
||||
The deployment SHALL ensure PostgreSQL data persists across container restarts.
|
||||
|
||||
#### Scenario: PostgreSQL volume
|
||||
- **WHEN** docker-compose up runs
|
||||
- **THEN** the `postgres-data` named volume is mounted to `/var/lib/postgresql/data` in the postgres container
|
||||
|
||||
#### Scenario: Container restart preserves data
|
||||
- **WHEN** the postgres container is stopped and restarted
|
||||
- **THEN** all database tables and data remain intact
|
||||
|
||||
#### Scenario: PostgreSQL database name
|
||||
- **WHEN** the postgres service starts
|
||||
- **THEN** the `POSTGRES_DB` environment variable is set to `candle_annotator`
|
||||
|
||||
### Requirement: Health check endpoint
|
||||
The API SHALL provide a health check endpoint for container orchestration.
|
||||
|
||||
#### Scenario: Health check endpoint responds
|
||||
- **WHEN** GET request sent to `/api/health`
|
||||
- **THEN** system returns 200 status with JSON `{ status: 'ok', timestamp: <unix_timestamp> }`
|
||||
|
||||
#### Scenario: Database connection check
|
||||
- **WHEN** GET request sent to `/api/health?check=db`
|
||||
- **THEN** system attempts a PostgreSQL query and returns 200 if successful, 503 if database unavailable
|
||||
|
|
@ -0,0 +1,37 @@
|
|||
## MODIFIED Requirements
|
||||
|
||||
### Requirement: PostgreSQL training metadata storage
|
||||
The system SHALL store training run metadata in the PostgreSQL database. Each training run record SHALL include: run_id (MLflow run ID), model_type, experiment_name, pipeline_config_hash, dataset_version, metrics summary (JSON), status, and timestamps (created_at, completed_at).
|
||||
|
||||
#### Scenario: Store training run record
|
||||
- **WHEN** a training run completes successfully
|
||||
- **THEN** the system inserts a record into the PostgreSQL `training_runs` table with the run metadata
|
||||
|
||||
#### Scenario: Query training history
|
||||
- **WHEN** the system queries training runs
|
||||
- **THEN** it returns records from PostgreSQL ordered by created_at descending
|
||||
|
||||
#### Scenario: Database name updated
|
||||
- **WHEN** the ML service connects to PostgreSQL
|
||||
- **THEN** it connects to the `candle_annotator` database (not `ml_db`)
|
||||
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Direct annotation data access
|
||||
The ML service SHALL read candle and annotation data directly from PostgreSQL instead of requiring CSV/JSON file exports. The ML service SHALL query the `candles`, `annotations`, `span_annotations`, and `charts` tables for training data.
|
||||
|
||||
#### Scenario: Query candle data for training
|
||||
- **WHEN** the ML training pipeline needs OHLC data for a chart
|
||||
- **THEN** it queries the `candles` table in PostgreSQL filtered by `chart_id`, ordered by `time`
|
||||
|
||||
#### Scenario: Query span annotations for labels
|
||||
- **WHEN** the ML training pipeline needs labeled spans for training
|
||||
- **THEN** it queries the `span_annotations` table in PostgreSQL filtered by `chart_id` and optionally by `source`
|
||||
|
||||
#### Scenario: No CSV/JSON export required
|
||||
- **WHEN** the ML training pipeline starts
|
||||
- **THEN** it does not require pre-exported CSV or JSON files — all data is read from PostgreSQL
|
||||
|
||||
#### Scenario: Shared database connection
|
||||
- **WHEN** the ML service reads candle/annotation data
|
||||
- **THEN** it uses the same PostgreSQL connection (same database, same credentials) as for `training_runs`
|
||||
|
|
@ -0,0 +1,80 @@
|
|||
## ADDED Requirements
|
||||
|
||||
### Requirement: PostgreSQL connection via Drizzle ORM
|
||||
The Next.js application SHALL connect to PostgreSQL using Drizzle ORM with the `node-postgres` (`pg`) driver. The connection SHALL use a pool with a configurable maximum number of connections (default: 10). The connection string SHALL be read from the `DATABASE_URL` environment variable.
|
||||
|
||||
#### Scenario: Successful connection
|
||||
- **WHEN** the application starts with a valid `DATABASE_URL` pointing to a running PostgreSQL instance
|
||||
- **THEN** Drizzle ORM establishes a connection pool and the `db` export is ready for queries
|
||||
|
||||
#### Scenario: Missing DATABASE_URL
|
||||
- **WHEN** the `DATABASE_URL` environment variable is not set
|
||||
- **THEN** the application SHALL fail to start with an error message indicating the missing variable
|
||||
|
||||
#### Scenario: Database unreachable
|
||||
- **WHEN** the PostgreSQL instance is not reachable at the configured URL
|
||||
- **THEN** the application SHALL fail to start with a connection error
|
||||
|
||||
### Requirement: PostgreSQL schema definitions
|
||||
The Drizzle schema SHALL define all frontend tables using `pgTable` from `drizzle-orm/pg-core`. The following tables SHALL be defined: `charts`, `candles`, `annotation_types`, `annotations`, `span_label_types`, `span_annotations`.
|
||||
|
||||
#### Scenario: Charts table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `charts` table has columns: `id` (serial, primary key), `name` (text, unique, not null), `created_at` (timestamp, not null, default now)
|
||||
|
||||
#### Scenario: Candles table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `candles` table has columns: `id` (serial, primary key), `chart_id` (integer, foreign key to charts.id, not null), `time` (timestamp, not null), `open` (double precision, not null), `high` (double precision, not null), `low` (double precision, not null), `close` (double precision, not null), with a unique index on `(chart_id, time)`
|
||||
|
||||
#### Scenario: Annotation types table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `annotation_types` table has columns: `id` (serial, primary key), `name` (text, unique, not null), `display_name` (text, not null), `color` (text, not null), `category` (text, not null), `icon` (text, nullable), `is_active` (boolean, not null, default true), `created_at` (timestamp, not null, default now)
|
||||
|
||||
#### Scenario: Annotations table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `annotations` table has columns: `id` (serial, primary key), `chart_id` (integer, foreign key to charts.id, not null), `timestamp` (timestamp, not null), `label_type` (text, not null), `geometry` (jsonb, nullable), `color` (text, default '#3b82f6'), `created_at` (timestamp, not null, default now)
|
||||
|
||||
#### Scenario: Span label types table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `span_label_types` table has columns: `id` (serial, primary key), `name` (text, unique, not null), `display_name` (text, not null), `color` (text, not null), `hotkey` (text, nullable), `is_active` (boolean, not null, default true), `sort_order` (integer, not null, default 0), `created_at` (timestamp, not null, default now)
|
||||
|
||||
#### Scenario: Span annotations table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `span_annotations` table has columns: `id` (serial, primary key), `chart_id` (integer, foreign key to charts.id, not null), `start_time` (timestamp, not null), `end_time` (timestamp, not null), `label` (text, not null), `confidence` (integer, nullable), `outcome` (text, nullable), `notes` (text, nullable), `sub_spans` (jsonb, nullable), `color` (text, not null, default '#2196F3'), `source` (text, not null, default 'human'), `model_prediction` (jsonb, nullable), `created_at` (timestamp, not null, default now)
|
||||
|
||||
### Requirement: PostgreSQL migrations via Drizzle Kit
|
||||
The project SHALL use Drizzle Kit to generate and apply PostgreSQL migrations. The `drizzle.config.ts` SHALL target the `postgresql` dialect. Existing SQLite migrations SHALL be removed.
|
||||
|
||||
#### Scenario: Generate migrations
|
||||
- **WHEN** `drizzle-kit generate` is executed
|
||||
- **THEN** a new SQL migration file is created in the `drizzle/` directory with PostgreSQL-dialect DDL
|
||||
|
||||
#### Scenario: Apply migrations at startup
|
||||
- **WHEN** the application starts (not during build phase)
|
||||
- **THEN** Drizzle runs pending migrations against the PostgreSQL database
|
||||
|
||||
#### Scenario: Skip migrations during build
|
||||
- **WHEN** `NEXT_PHASE` is `phase-production-build` or `phase-development-build`
|
||||
- **THEN** migration execution is skipped
|
||||
|
||||
### Requirement: npm dependency changes
|
||||
The project SHALL remove `better-sqlite3` and `@types/better-sqlite3` from dependencies and add `pg` and `@types/pg`.
|
||||
|
||||
#### Scenario: Dependencies updated
|
||||
- **WHEN** `package.json` is inspected
|
||||
- **THEN** `better-sqlite3` and `@types/better-sqlite3` are absent, and `pg` and `@types/pg` are present in dependencies
|
||||
|
||||
### Requirement: Data migration from SQLite to PostgreSQL
|
||||
The project SHALL include a one-time migration script at `scripts/migrate-sqlite-to-postgres.ts` that reads all data from the SQLite database and inserts it into PostgreSQL with appropriate type conversions.
|
||||
|
||||
#### Scenario: Migrate all tables
|
||||
- **WHEN** the migration script is executed with both databases accessible
|
||||
- **THEN** all rows from charts, candles, annotation_types, annotations, span_label_types, and span_annotations are transferred to PostgreSQL
|
||||
|
||||
#### Scenario: Type conversions applied
|
||||
- **WHEN** data is migrated
|
||||
- **THEN** SQLite integer timestamps are converted to PostgreSQL timestamps, integer booleans (0/1) are converted to PostgreSQL booleans, and text JSON fields are inserted as jsonb
|
||||
|
||||
#### Scenario: Idempotent execution
|
||||
- **WHEN** the migration script is run a second time on an already-migrated database
|
||||
- **THEN** the script either skips existing data or clears and re-inserts (with a flag), without creating duplicates
|
||||
Loading…
Add table
Add a link
Reference in a new issue