chore: archive ml-db-consolidation change and sync specs
- Archived change to openspec/changes/archive/2026-02-17-ml-db-consolidation/ - Created new postgres-data-layer spec with PostgreSQL connection, schema definitions, Drizzle migrations, npm deps, and SQLite migration requirements - Updated docker-deployment spec: Docker Compose now PostgreSQL-based (postgres dependency, ml-data volume, DATABASE_URL); env vars updated (DATABASE_URL added, DATABASE_PATH removed); database persistence updated to PostgreSQL volumes; health check updated to PostgreSQL - Updated ml-training spec: added database name scenario (candle_annotator) and new direct annotation data access requirement Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
0e8dcc6707
commit
38df874255
10 changed files with 532 additions and 31 deletions
|
|
@ -0,0 +1,2 @@
|
|||
schema: spec-driven
|
||||
created: 2026-02-17
|
||||
|
|
@ -0,0 +1,110 @@
|
|||
## Context
|
||||
|
||||
The candle annotator runs two databases:
|
||||
|
||||
1. **SQLite** (`data/candles.db`) — serves the Next.js frontend via Drizzle ORM (`better-sqlite3` driver). Contains 6 tables: charts, candles, annotations, annotation_types, span_annotations, span_label_types.
|
||||
2. **PostgreSQL** (`postgres:5432/ml_db`) — serves the Python ML service via SQLAlchemy. Contains 1 table: training_runs.
|
||||
|
||||
The ML service cannot directly query annotation/candle data. Data flows through CSV/JSON file exports. PostgreSQL already runs in Docker for the ML service, so consolidating means adding frontend tables there — not introducing a new service.
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
**Goals:**
|
||||
- Single PostgreSQL instance for all application data
|
||||
- Drizzle ORM continues to manage frontend schema (just switches dialect)
|
||||
- ML service gains direct read access to candle/annotation tables
|
||||
- Simplified Docker setup (one fewer volume, one database to back up)
|
||||
- One-time data migration path from SQLite to PostgreSQL
|
||||
|
||||
**Non-Goals:**
|
||||
- Changing the ML service ORM (SQLAlchemy stays)
|
||||
- Merging Drizzle and SQLAlchemy migration systems (each manages its own tables)
|
||||
- Changing API route logic or query patterns beyond what's needed for the dialect switch
|
||||
- Multi-tenant or schema separation (all tables go in the `public` schema)
|
||||
- Migrating away from Drizzle ORM
|
||||
|
||||
## Decisions
|
||||
|
||||
### 1. Drizzle PostgreSQL driver: `drizzle-orm/node-postgres` with `pg`
|
||||
|
||||
**Choice**: Use `pg` (node-postgres) as the driver.
|
||||
|
||||
**Why**: `pg` is the most mature PostgreSQL driver for Node.js. Drizzle supports it natively via `drizzle-orm/node-postgres`. The `postgres` (postgres.js) driver is also an option but `pg` has broader ecosystem support and is easier to debug.
|
||||
|
||||
**Alternative considered**: `postgres` (postgres.js) — lighter, promise-native, but less battle-tested with Drizzle migrations.
|
||||
|
||||
### 2. Shared database, single `public` schema
|
||||
|
||||
**Choice**: All tables (frontend + ML) live in the same database (`ml_db`) and the default `public` schema.
|
||||
|
||||
**Why**: The table sets don't overlap (frontend has charts/candles/annotations, ML has training_runs). Separate schemas add complexity with no benefit for 7 total tables. The ML service already connects to `ml_db`.
|
||||
|
||||
**Alternative considered**: Separate PostgreSQL schemas (`app` and `ml`) — cleaner isolation but adds schema-prefix complexity to queries and cross-schema references. Not worth it at this scale.
|
||||
|
||||
### 3. Rename database from `ml_db` to `candle_annotator`
|
||||
|
||||
**Choice**: Rename the PostgreSQL database to `candle_annotator` since it now serves the whole application, not just ML.
|
||||
|
||||
**Why**: `ml_db` is misleading when the database holds frontend data too. Renaming during consolidation is the natural time to do it.
|
||||
|
||||
**Alternative considered**: Keep `ml_db` — avoids a rename step but creates lasting confusion.
|
||||
|
||||
### 4. Fresh Drizzle migrations (drop SQLite migrations)
|
||||
|
||||
**Choice**: Delete all existing SQLite migrations in `drizzle/`, rewrite the schema file with `pgTable` equivalents, and run `drizzle-kit generate` to produce a fresh initial PostgreSQL migration.
|
||||
|
||||
**Why**: SQLite migrations are dialect-specific (e.g., `integer` for booleans, no native timestamps). Converting them one-by-one is fragile. A clean start from the PostgreSQL schema is simpler and produces idiomatic SQL.
|
||||
|
||||
**Alternative considered**: Manually converting each SQLite migration to PostgreSQL — error-prone and provides no benefit since there's no production data that needs incremental migration history.
|
||||
|
||||
### 5. Type mappings: SQLite → PostgreSQL
|
||||
|
||||
| SQLite type | PostgreSQL type | Notes |
|
||||
|---|---|---|
|
||||
| `integer` (PK, autoIncrement) | `serial` | Auto-incrementing integer |
|
||||
| `integer` (timestamps) | `timestamp` | Use `defaultNow()` where applicable |
|
||||
| `integer` (booleans like `is_active`) | `boolean` | True PostgreSQL booleans |
|
||||
| `real` | `doublePrecision` | OHLC price data |
|
||||
| `text` | `text` | No change |
|
||||
| `text` (JSON strings) | `jsonb` | For `geometry`, `sub_spans`, `model_prediction` |
|
||||
|
||||
### 6. Connection management for Next.js
|
||||
|
||||
**Choice**: Use a connection pool via `pg.Pool` with `max: 10` connections. Connection string from `DATABASE_URL` env var.
|
||||
|
||||
**Why**: SQLite was single-file, no pooling needed. PostgreSQL requires connection pooling for concurrent API requests. 10 connections is reasonable for the frontend workload.
|
||||
|
||||
### 7. ML service direct access to frontend tables
|
||||
|
||||
**Choice**: The ML service reads frontend tables (candles, annotations, span_annotations) directly via SQLAlchemy using its existing connection. No new SQLAlchemy models needed — raw SQL queries or lightweight table reflections are sufficient for read-only access.
|
||||
|
||||
**Why**: The ML service only needs to read training data. Adding full SQLAlchemy models for tables owned by Drizzle creates a dual-ownership problem. Raw queries or `Table` reflections keep it simple.
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
**[Schema drift between Drizzle and SQLAlchemy]** → Both ORMs manage tables in the same database. Drizzle owns frontend tables, SQLAlchemy owns ML tables. Neither should modify the other's tables. This is enforced by convention, not tooling.
|
||||
|
||||
**[Connection pool exhaustion]** → Adding the frontend's database traffic to the same PostgreSQL instance increases load. Mitigation: PostgreSQL 16 handles far more concurrent connections than SQLite. The `pg.Pool` max of 10 plus SQLAlchemy's pool of 5 is well within PostgreSQL's default `max_connections` of 100.
|
||||
|
||||
**[Data loss during migration]** → SQLite data must be migrated before switching. Mitigation: Write a migration script that exports SQLite data and imports to PostgreSQL. Run before deploying the new code. Keep the SQLite file as backup.
|
||||
|
||||
**[Drizzle push/generate differences]** → PostgreSQL dialect may generate slightly different migration SQL than expected. Mitigation: Review generated migrations before applying. Use `drizzle-kit push` for development, `drizzle-kit generate` + `drizzle-kit migrate` for production.
|
||||
|
||||
**[Boolean conversion]** → SQLite uses `0/1` for booleans, PostgreSQL uses `true/false`. Mitigation: The migration script handles conversion. Drizzle's `boolean()` type handles this transparently at the ORM level going forward.
|
||||
|
||||
## Migration Plan
|
||||
|
||||
1. **Update schema and dependencies** — Rewrite Drizzle schema for PostgreSQL, swap npm packages
|
||||
2. **Generate fresh migrations** — `drizzle-kit generate` from the new PostgreSQL schema
|
||||
3. **Update docker-compose.yml** — Rename database, add frontend dependency on postgres, remove `candle-data` volume
|
||||
4. **Update environment variables** — `DATABASE_URL` for the frontend service
|
||||
5. **Write data migration script** — `scripts/migrate-sqlite-to-postgres.ts` that reads SQLite and inserts into PostgreSQL with type conversions
|
||||
6. **Update db/index.ts** — Switch from `better-sqlite3` to `pg` pool, update migration runner
|
||||
7. **Test locally** — Run migrations, migrate data, verify API routes work
|
||||
8. **Deploy** — Stop current services, run PostgreSQL migrations, run data migration, deploy new code
|
||||
9. **Rollback** — If issues arise, revert docker-compose and code, restore SQLite volume. The SQLite file is kept as backup for 1 week post-migration.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- Should the ML service user (`ml_user`) have write access to frontend tables, or should we create a separate read-only role? (Recommendation: keep `ml_user` with full access for simplicity, revisit if the team grows.)
|
||||
- Do we need to preserve SQLite migration history in git for reference, or delete the `drizzle/` folder contents entirely? (Recommendation: delete and start fresh.)
|
||||
|
|
@ -0,0 +1,35 @@
|
|||
## Why
|
||||
|
||||
The project currently runs two separate database servers: SQLite (via Drizzle ORM) for the Next.js frontend and PostgreSQL for the ML service. This creates unnecessary operational complexity — two different ORMs, two migration systems, two backup strategies, and no ability for the ML service to directly query annotation/candle data. Consolidating to PostgreSQL as the single database simplifies deployment, enables direct cross-service data access, and reduces the infrastructure footprint.
|
||||
|
||||
## What Changes
|
||||
|
||||
- **BREAKING**: Replace SQLite/better-sqlite3/Drizzle with PostgreSQL/Drizzle (pg driver) for the Next.js frontend
|
||||
- Remove the `candle-data` Docker volume (SQLite file storage) and `DATABASE_PATH` env var
|
||||
- Migrate all frontend tables (charts, candles, annotations, annotation_types, span_annotations, span_label_types) into the existing PostgreSQL instance
|
||||
- Update Drizzle schema and config to target PostgreSQL instead of SQLite
|
||||
- Regenerate Drizzle migrations for PostgreSQL dialect (column types change: `integer` → `serial`, `real` → `double precision`, timestamps as proper `timestamp` types, etc.)
|
||||
- Update the ML service to share the same PostgreSQL database (or a separate schema within it) so it can directly query candle/annotation data instead of relying on CSV/JSON exports
|
||||
- Update docker-compose.yml to remove SQLite volume dependency and point the frontend at PostgreSQL
|
||||
- Update environment variables: frontend gets `DATABASE_URL` pointing to PostgreSQL
|
||||
|
||||
## Capabilities
|
||||
|
||||
### New Capabilities
|
||||
- `postgres-data-layer`: Unified PostgreSQL data access layer for the Next.js frontend, replacing the SQLite/better-sqlite3 setup with Drizzle's PostgreSQL driver
|
||||
|
||||
### Modified Capabilities
|
||||
- `docker-deployment`: Container configuration changes — remove SQLite volume, add PostgreSQL dependency for the frontend service, update environment variables
|
||||
- `ml-training`: ML service can now query annotations and candle data directly from PostgreSQL instead of requiring CSV/JSON file exports
|
||||
|
||||
## Impact
|
||||
|
||||
- **Database schema**: All 6 frontend tables move to PostgreSQL with type adaptations (SQLite integers → PostgreSQL serial/integer/timestamp)
|
||||
- **ORM layer**: `src/lib/db/index.ts` switches from `better-sqlite3` to `postgres` driver; schema types in `src/lib/db/schema.ts` change to PostgreSQL equivalents
|
||||
- **Dependencies**: Remove `better-sqlite3`, add `postgres` (or `pg`) npm package for Drizzle's PostgreSQL adapter
|
||||
- **Migrations**: Existing SQLite migrations become obsolete; new PostgreSQL migrations needed
|
||||
- **Docker**: `candle-annotator` service gains `depends_on: postgres`, loses `candle-data` volume mount
|
||||
- **Environment**: `.env` and `.env.example` updated with PostgreSQL connection string for frontend
|
||||
- **ML service**: `services/ml/app/db.py` gains access to frontend tables (candles, annotations) for direct querying
|
||||
- **Data migration**: Existing SQLite data needs a one-time migration script to PostgreSQL
|
||||
- **API routes**: All Next.js API routes using `db` from `src/lib/db` continue working (Drizzle abstracts the driver change), but queries using SQLite-specific syntax may need adjustment
|
||||
|
|
@ -0,0 +1,81 @@
|
|||
## MODIFIED Requirements
|
||||
|
||||
### Requirement: Docker Compose configuration
|
||||
The project SHALL include docker-compose.yml for simplified deployment orchestration.
|
||||
|
||||
#### Scenario: Service definition
|
||||
- **WHEN** docker-compose.yml is parsed
|
||||
- **THEN** defines service named 'candle-annotator' using Dockerfile from current directory
|
||||
|
||||
#### Scenario: Port mapping
|
||||
- **WHEN** docker-compose up runs
|
||||
- **THEN** maps host port 3000 to container port 3000
|
||||
|
||||
#### Scenario: Volume mounting for ML data
|
||||
- **WHEN** docker-compose up runs
|
||||
- **THEN** mounts named volume 'ml-data' to /app/ml-data in the candle-annotator container
|
||||
|
||||
#### Scenario: Frontend depends on PostgreSQL
|
||||
- **WHEN** docker-compose up runs
|
||||
- **THEN** the candle-annotator service starts only after the postgres service is healthy (`depends_on: postgres: condition: service_healthy`)
|
||||
|
||||
#### Scenario: Frontend DATABASE_URL
|
||||
- **WHEN** the candle-annotator service starts
|
||||
- **THEN** the `DATABASE_URL` environment variable is set to `postgresql://ml_user:ml_password@postgres:5432/candle_annotator`
|
||||
|
||||
#### Scenario: Restart policy
|
||||
- **WHEN** container crashes or stops
|
||||
- **THEN** docker-compose automatically restarts container unless explicitly stopped (restart: unless-stopped)
|
||||
|
||||
#### Scenario: No SQLite volume
|
||||
- **WHEN** docker-compose.yml is parsed
|
||||
- **THEN** there is no `candle-data` volume defined or mounted
|
||||
|
||||
### Requirement: Environment variable configuration
|
||||
The project SHALL use environment variables for runtime configuration.
|
||||
|
||||
#### Scenario: .env.example file
|
||||
- **WHEN** repository is cloned
|
||||
- **THEN** includes .env.example file documenting all configurable environment variables with example values
|
||||
|
||||
#### Scenario: DATABASE_URL configuration
|
||||
- **WHEN** `DATABASE_URL` environment variable is set
|
||||
- **THEN** the Next.js application connects to the PostgreSQL database at the specified URL
|
||||
|
||||
#### Scenario: No DATABASE_PATH variable
|
||||
- **WHEN** environment variables are inspected
|
||||
- **THEN** there is no `DATABASE_PATH` variable (SQLite path is removed)
|
||||
|
||||
#### Scenario: PORT configuration
|
||||
- **WHEN** PORT environment variable is set
|
||||
- **THEN** Next.js server listens on specified port (default: 3000)
|
||||
|
||||
#### Scenario: NODE_ENV configuration
|
||||
- **WHEN** NODE_ENV environment variable is set to 'production'
|
||||
- **THEN** Next.js runs in production mode with optimizations enabled
|
||||
|
||||
### Requirement: Database persistence
|
||||
The deployment SHALL ensure PostgreSQL data persists across container restarts.
|
||||
|
||||
#### Scenario: PostgreSQL volume
|
||||
- **WHEN** docker-compose up runs
|
||||
- **THEN** the `postgres-data` named volume is mounted to `/var/lib/postgresql/data` in the postgres container
|
||||
|
||||
#### Scenario: Container restart preserves data
|
||||
- **WHEN** the postgres container is stopped and restarted
|
||||
- **THEN** all database tables and data remain intact
|
||||
|
||||
#### Scenario: PostgreSQL database name
|
||||
- **WHEN** the postgres service starts
|
||||
- **THEN** the `POSTGRES_DB` environment variable is set to `candle_annotator`
|
||||
|
||||
### Requirement: Health check endpoint
|
||||
The API SHALL provide a health check endpoint for container orchestration.
|
||||
|
||||
#### Scenario: Health check endpoint responds
|
||||
- **WHEN** GET request sent to `/api/health`
|
||||
- **THEN** system returns 200 status with JSON `{ status: 'ok', timestamp: <unix_timestamp> }`
|
||||
|
||||
#### Scenario: Database connection check
|
||||
- **WHEN** GET request sent to `/api/health?check=db`
|
||||
- **THEN** system attempts a PostgreSQL query and returns 200 if successful, 503 if database unavailable
|
||||
|
|
@ -0,0 +1,37 @@
|
|||
## MODIFIED Requirements
|
||||
|
||||
### Requirement: PostgreSQL training metadata storage
|
||||
The system SHALL store training run metadata in the PostgreSQL database. Each training run record SHALL include: run_id (MLflow run ID), model_type, experiment_name, pipeline_config_hash, dataset_version, metrics summary (JSON), status, and timestamps (created_at, completed_at).
|
||||
|
||||
#### Scenario: Store training run record
|
||||
- **WHEN** a training run completes successfully
|
||||
- **THEN** the system inserts a record into the PostgreSQL `training_runs` table with the run metadata
|
||||
|
||||
#### Scenario: Query training history
|
||||
- **WHEN** the system queries training runs
|
||||
- **THEN** it returns records from PostgreSQL ordered by created_at descending
|
||||
|
||||
#### Scenario: Database name updated
|
||||
- **WHEN** the ML service connects to PostgreSQL
|
||||
- **THEN** it connects to the `candle_annotator` database (not `ml_db`)
|
||||
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Direct annotation data access
|
||||
The ML service SHALL read candle and annotation data directly from PostgreSQL instead of requiring CSV/JSON file exports. The ML service SHALL query the `candles`, `annotations`, `span_annotations`, and `charts` tables for training data.
|
||||
|
||||
#### Scenario: Query candle data for training
|
||||
- **WHEN** the ML training pipeline needs OHLC data for a chart
|
||||
- **THEN** it queries the `candles` table in PostgreSQL filtered by `chart_id`, ordered by `time`
|
||||
|
||||
#### Scenario: Query span annotations for labels
|
||||
- **WHEN** the ML training pipeline needs labeled spans for training
|
||||
- **THEN** it queries the `span_annotations` table in PostgreSQL filtered by `chart_id` and optionally by `source`
|
||||
|
||||
#### Scenario: No CSV/JSON export required
|
||||
- **WHEN** the ML training pipeline starts
|
||||
- **THEN** it does not require pre-exported CSV or JSON files — all data is read from PostgreSQL
|
||||
|
||||
#### Scenario: Shared database connection
|
||||
- **WHEN** the ML service reads candle/annotation data
|
||||
- **THEN** it uses the same PostgreSQL connection (same database, same credentials) as for `training_runs`
|
||||
|
|
@ -0,0 +1,80 @@
|
|||
## ADDED Requirements
|
||||
|
||||
### Requirement: PostgreSQL connection via Drizzle ORM
|
||||
The Next.js application SHALL connect to PostgreSQL using Drizzle ORM with the `node-postgres` (`pg`) driver. The connection SHALL use a pool with a configurable maximum number of connections (default: 10). The connection string SHALL be read from the `DATABASE_URL` environment variable.
|
||||
|
||||
#### Scenario: Successful connection
|
||||
- **WHEN** the application starts with a valid `DATABASE_URL` pointing to a running PostgreSQL instance
|
||||
- **THEN** Drizzle ORM establishes a connection pool and the `db` export is ready for queries
|
||||
|
||||
#### Scenario: Missing DATABASE_URL
|
||||
- **WHEN** the `DATABASE_URL` environment variable is not set
|
||||
- **THEN** the application SHALL fail to start with an error message indicating the missing variable
|
||||
|
||||
#### Scenario: Database unreachable
|
||||
- **WHEN** the PostgreSQL instance is not reachable at the configured URL
|
||||
- **THEN** the application SHALL fail to start with a connection error
|
||||
|
||||
### Requirement: PostgreSQL schema definitions
|
||||
The Drizzle schema SHALL define all frontend tables using `pgTable` from `drizzle-orm/pg-core`. The following tables SHALL be defined: `charts`, `candles`, `annotation_types`, `annotations`, `span_label_types`, `span_annotations`.
|
||||
|
||||
#### Scenario: Charts table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `charts` table has columns: `id` (serial, primary key), `name` (text, unique, not null), `created_at` (timestamp, not null, default now)
|
||||
|
||||
#### Scenario: Candles table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `candles` table has columns: `id` (serial, primary key), `chart_id` (integer, foreign key to charts.id, not null), `time` (timestamp, not null), `open` (double precision, not null), `high` (double precision, not null), `low` (double precision, not null), `close` (double precision, not null), with a unique index on `(chart_id, time)`
|
||||
|
||||
#### Scenario: Annotation types table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `annotation_types` table has columns: `id` (serial, primary key), `name` (text, unique, not null), `display_name` (text, not null), `color` (text, not null), `category` (text, not null), `icon` (text, nullable), `is_active` (boolean, not null, default true), `created_at` (timestamp, not null, default now)
|
||||
|
||||
#### Scenario: Annotations table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `annotations` table has columns: `id` (serial, primary key), `chart_id` (integer, foreign key to charts.id, not null), `timestamp` (timestamp, not null), `label_type` (text, not null), `geometry` (jsonb, nullable), `color` (text, default '#3b82f6'), `created_at` (timestamp, not null, default now)
|
||||
|
||||
#### Scenario: Span label types table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `span_label_types` table has columns: `id` (serial, primary key), `name` (text, unique, not null), `display_name` (text, not null), `color` (text, not null), `hotkey` (text, nullable), `is_active` (boolean, not null, default true), `sort_order` (integer, not null, default 0), `created_at` (timestamp, not null, default now)
|
||||
|
||||
#### Scenario: Span annotations table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `span_annotations` table has columns: `id` (serial, primary key), `chart_id` (integer, foreign key to charts.id, not null), `start_time` (timestamp, not null), `end_time` (timestamp, not null), `label` (text, not null), `confidence` (integer, nullable), `outcome` (text, nullable), `notes` (text, nullable), `sub_spans` (jsonb, nullable), `color` (text, not null, default '#2196F3'), `source` (text, not null, default 'human'), `model_prediction` (jsonb, nullable), `created_at` (timestamp, not null, default now)
|
||||
|
||||
### Requirement: PostgreSQL migrations via Drizzle Kit
|
||||
The project SHALL use Drizzle Kit to generate and apply PostgreSQL migrations. The `drizzle.config.ts` SHALL target the `postgresql` dialect. Existing SQLite migrations SHALL be removed.
|
||||
|
||||
#### Scenario: Generate migrations
|
||||
- **WHEN** `drizzle-kit generate` is executed
|
||||
- **THEN** a new SQL migration file is created in the `drizzle/` directory with PostgreSQL-dialect DDL
|
||||
|
||||
#### Scenario: Apply migrations at startup
|
||||
- **WHEN** the application starts (not during build phase)
|
||||
- **THEN** Drizzle runs pending migrations against the PostgreSQL database
|
||||
|
||||
#### Scenario: Skip migrations during build
|
||||
- **WHEN** `NEXT_PHASE` is `phase-production-build` or `phase-development-build`
|
||||
- **THEN** migration execution is skipped
|
||||
|
||||
### Requirement: npm dependency changes
|
||||
The project SHALL remove `better-sqlite3` and `@types/better-sqlite3` from dependencies and add `pg` and `@types/pg`.
|
||||
|
||||
#### Scenario: Dependencies updated
|
||||
- **WHEN** `package.json` is inspected
|
||||
- **THEN** `better-sqlite3` and `@types/better-sqlite3` are absent, and `pg` and `@types/pg` are present in dependencies
|
||||
|
||||
### Requirement: Data migration from SQLite to PostgreSQL
|
||||
The project SHALL include a one-time migration script at `scripts/migrate-sqlite-to-postgres.ts` that reads all data from the SQLite database and inserts it into PostgreSQL with appropriate type conversions.
|
||||
|
||||
#### Scenario: Migrate all tables
|
||||
- **WHEN** the migration script is executed with both databases accessible
|
||||
- **THEN** all rows from charts, candles, annotation_types, annotations, span_label_types, and span_annotations are transferred to PostgreSQL
|
||||
|
||||
#### Scenario: Type conversions applied
|
||||
- **WHEN** data is migrated
|
||||
- **THEN** SQLite integer timestamps are converted to PostgreSQL timestamps, integer booleans (0/1) are converted to PostgreSQL booleans, and text JSON fields are inserted as jsonb
|
||||
|
||||
#### Scenario: Idempotent execution
|
||||
- **WHEN** the migration script is run a second time on an already-migrated database
|
||||
- **THEN** the script either skips existing data or clears and re-inserts (with a flag), without creating duplicates
|
||||
|
|
@ -0,0 +1,53 @@
|
|||
## 1. Dependencies and Configuration
|
||||
|
||||
- [x] 1.1 Remove `better-sqlite3` and `@types/better-sqlite3` from package.json
|
||||
- [x] 1.2 Add `pg` and `@types/pg` to package.json dependencies
|
||||
- [x] 1.3 Run `npm install` to update node_modules and lockfile
|
||||
- [x] 1.4 Update `drizzle.config.ts` to target `postgresql` dialect with `DATABASE_URL` env var
|
||||
- [x] 1.5 Update `.env.example` — replace `DATABASE_PATH` with `DATABASE_URL=postgresql://ml_user:ml_password@postgres:5432/candle_annotator`
|
||||
|
||||
## 2. Drizzle Schema Migration (SQLite → PostgreSQL)
|
||||
|
||||
- [x] 2.1 Rewrite `src/lib/db/schema.ts` — replace all `sqliteTable` with `pgTable`, apply type mappings (integer→serial, integer→timestamp, integer→boolean, real→doublePrecision, text JSON→jsonb)
|
||||
- [x] 2.2 Delete all existing SQLite migration files in `drizzle/` directory
|
||||
- [x] 2.3 Run `drizzle-kit generate` to produce fresh PostgreSQL migration SQL
|
||||
- [x] 2.4 Review generated migration SQL for correctness
|
||||
|
||||
## 3. Database Connection Layer
|
||||
|
||||
- [x] 3.1 Rewrite `src/lib/db/index.ts` — replace `better-sqlite3` driver with `pg.Pool` (max: 10), read `DATABASE_URL` from env, fail if missing
|
||||
- [x] 3.2 Update migration runner to use PostgreSQL-compatible execution (skip during build phase via `NEXT_PHASE` check)
|
||||
- [x] 3.3 Update all imports if any changed (verify `db` export still works for API routes)
|
||||
|
||||
## 4. API Route Adjustments
|
||||
|
||||
- [x] 4.1 Audit all Next.js API routes using `db` for SQLite-specific syntax (e.g., integer booleans, raw SQL fragments)
|
||||
- [x] 4.2 Fix any SQLite-specific query patterns to work with PostgreSQL (boolean handling, timestamp handling, jsonb operations)
|
||||
- [x] 4.3 Update health check endpoint (`/api/health`) to verify PostgreSQL connectivity
|
||||
|
||||
## 5. Docker and Deployment
|
||||
|
||||
- [x] 5.1 Update `docker-compose.yml` — rename `POSTGRES_DB` to `candle_annotator`, add `DATABASE_URL` env to candle-annotator service, add `depends_on: postgres` with health check condition
|
||||
- [x] 5.2 Remove `candle-data` volume from `docker-compose.yml` (SQLite volume)
|
||||
- [x] 5.3 Update `Dockerfile` if it references SQLite or `DATABASE_PATH`
|
||||
- [x] 5.4 Update ML service database connection — change database name from `ml_db` to `candle_annotator` in environment config
|
||||
|
||||
## 6. ML Service Direct Data Access
|
||||
|
||||
- [x] 6.1 Add SQLAlchemy table reflections or raw queries in the ML service for reading `candles`, `annotations`, `span_annotations`, `charts` tables
|
||||
- [x] 6.2 Update ML training pipeline to query candle/annotation data from PostgreSQL instead of CSV/JSON exports
|
||||
- [x] 6.3 Remove or deprecate any CSV/JSON export code paths that are no longer needed
|
||||
|
||||
## 7. Data Migration Script
|
||||
|
||||
- [x] 7.1 Create `scripts/migrate-sqlite-to-postgres.ts` — read all 6 tables from SQLite, apply type conversions (timestamps, booleans, JSON→jsonb), insert into PostgreSQL
|
||||
- [x] 7.2 Make the script idempotent (skip or clear+re-insert with flag)
|
||||
- [x] 7.3 Test migration script with existing SQLite data
|
||||
|
||||
## 8. Testing and Verification
|
||||
|
||||
- [x] 8.1 Run the full application locally with PostgreSQL — verify all API routes work
|
||||
- [x] 8.2 Verify ML service can query candle/annotation data from shared database
|
||||
- [x] 8.3 Run `docker compose up` and verify all services start correctly with new configuration
|
||||
- [x] 8.4 Update `DEPLOYMENT.md` with new deployment steps (PostgreSQL migration, data migration script, rollback procedure)
|
||||
- [x] 8.5 Update `README.md` and `CLAUDE_DESCRIPTION.md` with database architecture changes
|
||||
|
|
@ -32,27 +32,31 @@ The project SHALL include docker-compose.yml for simplified deployment orchestra
|
|||
|
||||
#### Scenario: Service definition
|
||||
- **WHEN** docker-compose.yml is parsed
|
||||
- **THEN** defines single service named 'candle-annotator' using Dockerfile from current directory
|
||||
- **THEN** defines service named 'candle-annotator' using Dockerfile from current directory
|
||||
|
||||
#### Scenario: Port mapping
|
||||
- **WHEN** docker-compose up runs
|
||||
- **THEN** maps host port 3000 to container port 3000 (configurable via PORT environment variable)
|
||||
- **THEN** maps host port 3000 to container port 3000
|
||||
|
||||
#### Scenario: Volume mounting for database
|
||||
#### Scenario: Volume mounting for ML data
|
||||
- **WHEN** docker-compose up runs
|
||||
- **THEN** mounts named volume 'candle-data' to /app/data in container for SQLite database persistence
|
||||
- **THEN** mounts named volume 'ml-data' to /app/ml-data in the candle-annotator container
|
||||
|
||||
#### Scenario: Environment variable configuration
|
||||
- **WHEN** docker-compose.yml is used
|
||||
- **THEN** supports loading environment variables from .env file for NODE_ENV, PORT, and other configs
|
||||
#### Scenario: Frontend depends on PostgreSQL
|
||||
- **WHEN** docker-compose up runs
|
||||
- **THEN** the candle-annotator service starts only after the postgres service is healthy (`depends_on: postgres: condition: service_healthy`)
|
||||
|
||||
#### Scenario: Frontend DATABASE_URL
|
||||
- **WHEN** the candle-annotator service starts
|
||||
- **THEN** the `DATABASE_URL` environment variable is set to `postgresql://ml_user:ml_password@postgres:5432/candle_annotator`
|
||||
|
||||
#### Scenario: Restart policy
|
||||
- **WHEN** container crashes or stops
|
||||
- **THEN** docker-compose automatically restarts container unless explicitly stopped (restart: unless-stopped)
|
||||
|
||||
#### Scenario: Volume declaration
|
||||
#### Scenario: No SQLite volume
|
||||
- **WHEN** docker-compose.yml is parsed
|
||||
- **THEN** declares 'candle-data' as named volume in volumes section
|
||||
- **THEN** there is no `candle-data` volume defined or mounted
|
||||
|
||||
### Requirement: Environment variable configuration
|
||||
The project SHALL use environment variables for runtime configuration.
|
||||
|
|
@ -61,6 +65,14 @@ The project SHALL use environment variables for runtime configuration.
|
|||
- **WHEN** repository is cloned
|
||||
- **THEN** includes .env.example file documenting all configurable environment variables with example values
|
||||
|
||||
#### Scenario: DATABASE_URL configuration
|
||||
- **WHEN** `DATABASE_URL` environment variable is set
|
||||
- **THEN** the Next.js application connects to the PostgreSQL database at the specified URL
|
||||
|
||||
#### Scenario: No DATABASE_PATH variable
|
||||
- **WHEN** environment variables are inspected
|
||||
- **THEN** there is no `DATABASE_PATH` variable (SQLite path is removed)
|
||||
|
||||
#### Scenario: PORT configuration
|
||||
- **WHEN** PORT environment variable is set
|
||||
- **THEN** Next.js server listens on specified port (default: 3000)
|
||||
|
|
@ -69,14 +81,6 @@ The project SHALL use environment variables for runtime configuration.
|
|||
- **WHEN** NODE_ENV environment variable is set to 'production'
|
||||
- **THEN** Next.js runs in production mode with optimizations enabled
|
||||
|
||||
#### Scenario: Database path configuration
|
||||
- **WHEN** DATABASE_PATH environment variable is set (optional)
|
||||
- **THEN** SQLite database file is created at specified path (default: ./data/candles.db)
|
||||
|
||||
#### Scenario: HOSTNAME configuration
|
||||
- **WHEN** HOSTNAME environment variable is set
|
||||
- **THEN** Next.js server binds to specified hostname (default: 0.0.0.0 for containers)
|
||||
|
||||
### Requirement: Health check endpoint
|
||||
The API SHALL provide a health check endpoint for container orchestration.
|
||||
|
||||
|
|
@ -86,7 +90,7 @@ The API SHALL provide a health check endpoint for container orchestration.
|
|||
|
||||
#### Scenario: Database connection check
|
||||
- **WHEN** GET request sent to `/api/health?check=db`
|
||||
- **THEN** system attempts simple database query and returns 200 if successful, 503 if database unavailable
|
||||
- **THEN** system attempts a PostgreSQL query and returns 200 if successful, 503 if database unavailable
|
||||
|
||||
#### Scenario: Health check in Dockerfile
|
||||
- **WHEN** Dockerfile defines HEALTHCHECK
|
||||
|
|
@ -138,23 +142,19 @@ The Docker image SHALL be optimized for production use with minimal size.
|
|||
- **THEN** final image size is under 200MB (excluding data volume)
|
||||
|
||||
### Requirement: Database persistence
|
||||
The deployment SHALL ensure SQLite database persists across container restarts.
|
||||
The deployment SHALL ensure PostgreSQL data persists across container restarts.
|
||||
|
||||
#### Scenario: Volume mounting
|
||||
- **WHEN** container runs with volume mount
|
||||
- **THEN** /app/data directory is mounted from host or Docker volume
|
||||
|
||||
#### Scenario: Database file location
|
||||
- **WHEN** application initializes database
|
||||
- **THEN** SQLite file is created at /app/data/candles.db (inside mounted volume)
|
||||
#### Scenario: PostgreSQL volume
|
||||
- **WHEN** docker-compose up runs
|
||||
- **THEN** the `postgres-data` named volume is mounted to `/var/lib/postgresql/data` in the postgres container
|
||||
|
||||
#### Scenario: Container restart preserves data
|
||||
- **WHEN** container is stopped and restarted
|
||||
- **THEN** existing database file is reused and all annotations and candles remain intact
|
||||
- **WHEN** the postgres container is stopped and restarted
|
||||
- **THEN** all database tables and data remain intact
|
||||
|
||||
#### Scenario: File permissions
|
||||
- **WHEN** container creates database file
|
||||
- **THEN** appuser has read/write permissions to /app/data directory
|
||||
#### Scenario: PostgreSQL database name
|
||||
- **WHEN** the postgres service starts
|
||||
- **THEN** the `POSTGRES_DB` environment variable is set to `candle_annotator`
|
||||
|
||||
### Requirement: Deployment documentation
|
||||
DEPLOYMENT.md SHALL include comprehensive Docker deployment instructions.
|
||||
|
|
|
|||
|
|
@ -84,6 +84,29 @@ The system SHALL store training run metadata in the PostgreSQL database. Each tr
|
|||
- **WHEN** the system queries training runs
|
||||
- **THEN** it returns records from PostgreSQL ordered by created_at descending
|
||||
|
||||
#### Scenario: Database name updated
|
||||
- **WHEN** the ML service connects to PostgreSQL
|
||||
- **THEN** it connects to the `candle_annotator` database (not `ml_db`)
|
||||
|
||||
### Requirement: Direct annotation data access
|
||||
The ML service SHALL read candle and annotation data directly from PostgreSQL instead of requiring CSV/JSON file exports. The ML service SHALL query the `candles`, `annotations`, `span_annotations`, and `charts` tables for training data.
|
||||
|
||||
#### Scenario: Query candle data for training
|
||||
- **WHEN** the ML training pipeline needs OHLC data for a chart
|
||||
- **THEN** it queries the `candles` table in PostgreSQL filtered by `chart_id`, ordered by `time`
|
||||
|
||||
#### Scenario: Query span annotations for labels
|
||||
- **WHEN** the ML training pipeline needs labeled spans for training
|
||||
- **THEN** it queries the `span_annotations` table in PostgreSQL filtered by `chart_id` and optionally by `source`
|
||||
|
||||
#### Scenario: No CSV/JSON export required
|
||||
- **WHEN** the ML training pipeline starts
|
||||
- **THEN** it does not require pre-exported CSV or JSON files — all data is read from PostgreSQL
|
||||
|
||||
#### Scenario: Shared database connection
|
||||
- **WHEN** the ML service reads candle/annotation data
|
||||
- **THEN** it uses the same PostgreSQL connection (same database, same credentials) as for `training_runs`
|
||||
|
||||
### Requirement: Pipeline config logging
|
||||
The system SHALL log the full pipeline YAML config as an MLflow artifact with each training run. This config SHALL be used during inference to replicate the exact preprocessing steps.
|
||||
|
||||
|
|
|
|||
80
openspec/specs/postgres-data-layer/spec.md
Normal file
80
openspec/specs/postgres-data-layer/spec.md
Normal file
|
|
@ -0,0 +1,80 @@
|
|||
## ADDED Requirements
|
||||
|
||||
### Requirement: PostgreSQL connection via Drizzle ORM
|
||||
The Next.js application SHALL connect to PostgreSQL using Drizzle ORM with the `node-postgres` (`pg`) driver. The connection SHALL use a pool with a configurable maximum number of connections (default: 10). The connection string SHALL be read from the `DATABASE_URL` environment variable.
|
||||
|
||||
#### Scenario: Successful connection
|
||||
- **WHEN** the application starts with a valid `DATABASE_URL` pointing to a running PostgreSQL instance
|
||||
- **THEN** Drizzle ORM establishes a connection pool and the `db` export is ready for queries
|
||||
|
||||
#### Scenario: Missing DATABASE_URL
|
||||
- **WHEN** the `DATABASE_URL` environment variable is not set
|
||||
- **THEN** the application SHALL fail to start with an error message indicating the missing variable
|
||||
|
||||
#### Scenario: Database unreachable
|
||||
- **WHEN** the PostgreSQL instance is not reachable at the configured URL
|
||||
- **THEN** the application SHALL fail to start with a connection error
|
||||
|
||||
### Requirement: PostgreSQL schema definitions
|
||||
The Drizzle schema SHALL define all frontend tables using `pgTable` from `drizzle-orm/pg-core`. The following tables SHALL be defined: `charts`, `candles`, `annotation_types`, `annotations`, `span_label_types`, `span_annotations`.
|
||||
|
||||
#### Scenario: Charts table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `charts` table has columns: `id` (serial, primary key), `name` (text, unique, not null), `created_at` (timestamp, not null, default now)
|
||||
|
||||
#### Scenario: Candles table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `candles` table has columns: `id` (serial, primary key), `chart_id` (integer, foreign key to charts.id, not null), `time` (timestamp, not null), `open` (double precision, not null), `high` (double precision, not null), `low` (double precision, not null), `close` (double precision, not null), with a unique index on `(chart_id, time)`
|
||||
|
||||
#### Scenario: Annotation types table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `annotation_types` table has columns: `id` (serial, primary key), `name` (text, unique, not null), `display_name` (text, not null), `color` (text, not null), `category` (text, not null), `icon` (text, nullable), `is_active` (boolean, not null, default true), `created_at` (timestamp, not null, default now)
|
||||
|
||||
#### Scenario: Annotations table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `annotations` table has columns: `id` (serial, primary key), `chart_id` (integer, foreign key to charts.id, not null), `timestamp` (timestamp, not null), `label_type` (text, not null), `geometry` (jsonb, nullable), `color` (text, default '#3b82f6'), `created_at` (timestamp, not null, default now)
|
||||
|
||||
#### Scenario: Span label types table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `span_label_types` table has columns: `id` (serial, primary key), `name` (text, unique, not null), `display_name` (text, not null), `color` (text, not null), `hotkey` (text, nullable), `is_active` (boolean, not null, default true), `sort_order` (integer, not null, default 0), `created_at` (timestamp, not null, default now)
|
||||
|
||||
#### Scenario: Span annotations table schema
|
||||
- **WHEN** the schema is loaded
|
||||
- **THEN** the `span_annotations` table has columns: `id` (serial, primary key), `chart_id` (integer, foreign key to charts.id, not null), `start_time` (timestamp, not null), `end_time` (timestamp, not null), `label` (text, not null), `confidence` (integer, nullable), `outcome` (text, nullable), `notes` (text, nullable), `sub_spans` (jsonb, nullable), `color` (text, not null, default '#2196F3'), `source` (text, not null, default 'human'), `model_prediction` (jsonb, nullable), `created_at` (timestamp, not null, default now)
|
||||
|
||||
### Requirement: PostgreSQL migrations via Drizzle Kit
|
||||
The project SHALL use Drizzle Kit to generate and apply PostgreSQL migrations. The `drizzle.config.ts` SHALL target the `postgresql` dialect. Existing SQLite migrations SHALL be removed.
|
||||
|
||||
#### Scenario: Generate migrations
|
||||
- **WHEN** `drizzle-kit generate` is executed
|
||||
- **THEN** a new SQL migration file is created in the `drizzle/` directory with PostgreSQL-dialect DDL
|
||||
|
||||
#### Scenario: Apply migrations at startup
|
||||
- **WHEN** the application starts (not during build phase)
|
||||
- **THEN** Drizzle runs pending migrations against the PostgreSQL database
|
||||
|
||||
#### Scenario: Skip migrations during build
|
||||
- **WHEN** `NEXT_PHASE` is `phase-production-build` or `phase-development-build`
|
||||
- **THEN** migration execution is skipped
|
||||
|
||||
### Requirement: npm dependency changes
|
||||
The project SHALL remove `better-sqlite3` and `@types/better-sqlite3` from dependencies and add `pg` and `@types/pg`.
|
||||
|
||||
#### Scenario: Dependencies updated
|
||||
- **WHEN** `package.json` is inspected
|
||||
- **THEN** `better-sqlite3` and `@types/better-sqlite3` are absent, and `pg` and `@types/pg` are present in dependencies
|
||||
|
||||
### Requirement: Data migration from SQLite to PostgreSQL
|
||||
The project SHALL include a one-time migration script at `scripts/migrate-sqlite-to-postgres.ts` that reads all data from the SQLite database and inserts it into PostgreSQL with appropriate type conversions.
|
||||
|
||||
#### Scenario: Migrate all tables
|
||||
- **WHEN** the migration script is executed with both databases accessible
|
||||
- **THEN** all rows from charts, candles, annotation_types, annotations, span_label_types, and span_annotations are transferred to PostgreSQL
|
||||
|
||||
#### Scenario: Type conversions applied
|
||||
- **WHEN** data is migrated
|
||||
- **THEN** SQLite integer timestamps are converted to PostgreSQL timestamps, integer booleans (0/1) are converted to PostgreSQL booleans, and text JSON fields are inserted as jsonb
|
||||
|
||||
#### Scenario: Idempotent execution
|
||||
- **WHEN** the migration script is run a second time on an already-migrated database
|
||||
- **THEN** the script either skips existing data or clears and re-inserts (with a flag), without creating duplicates
|
||||
Loading…
Add table
Add a link
Reference in a new issue