- Remove better-sqlite3, add pg driver - Convert schema to PostgreSQL types (serial, timestamp, boolean, jsonb) - Generate fresh PostgreSQL migrations - Update database connection layer with pg.Pool - Fix all API routes: remove JSON.parse/stringify, use native timestamps and booleans - Update drizzle.config.ts and .env.example for PostgreSQL
110 lines
7.4 KiB
Markdown
110 lines
7.4 KiB
Markdown
## Context
|
|
|
|
The candle annotator runs two databases:
|
|
|
|
1. **SQLite** (`data/candles.db`) — serves the Next.js frontend via Drizzle ORM (`better-sqlite3` driver). Contains 6 tables: charts, candles, annotations, annotation_types, span_annotations, span_label_types.
|
|
2. **PostgreSQL** (`postgres:5432/ml_db`) — serves the Python ML service via SQLAlchemy. Contains 1 table: training_runs.
|
|
|
|
The ML service cannot directly query annotation/candle data. Data flows through CSV/JSON file exports. PostgreSQL already runs in Docker for the ML service, so consolidating means adding frontend tables there — not introducing a new service.
|
|
|
|
## Goals / Non-Goals
|
|
|
|
**Goals:**
|
|
- Single PostgreSQL instance for all application data
|
|
- Drizzle ORM continues to manage frontend schema (just switches dialect)
|
|
- ML service gains direct read access to candle/annotation tables
|
|
- Simplified Docker setup (one fewer volume, one database to back up)
|
|
- One-time data migration path from SQLite to PostgreSQL
|
|
|
|
**Non-Goals:**
|
|
- Changing the ML service ORM (SQLAlchemy stays)
|
|
- Merging Drizzle and SQLAlchemy migration systems (each manages its own tables)
|
|
- Changing API route logic or query patterns beyond what's needed for the dialect switch
|
|
- Multi-tenant or schema separation (all tables go in the `public` schema)
|
|
- Migrating away from Drizzle ORM
|
|
|
|
## Decisions
|
|
|
|
### 1. Drizzle PostgreSQL driver: `drizzle-orm/node-postgres` with `pg`
|
|
|
|
**Choice**: Use `pg` (node-postgres) as the driver.
|
|
|
|
**Why**: `pg` is the most mature PostgreSQL driver for Node.js. Drizzle supports it natively via `drizzle-orm/node-postgres`. The `postgres` (postgres.js) driver is also an option but `pg` has broader ecosystem support and is easier to debug.
|
|
|
|
**Alternative considered**: `postgres` (postgres.js) — lighter, promise-native, but less battle-tested with Drizzle migrations.
|
|
|
|
### 2. Shared database, single `public` schema
|
|
|
|
**Choice**: All tables (frontend + ML) live in the same database (`ml_db`) and the default `public` schema.
|
|
|
|
**Why**: The table sets don't overlap (frontend has charts/candles/annotations, ML has training_runs). Separate schemas add complexity with no benefit for 7 total tables. The ML service already connects to `ml_db`.
|
|
|
|
**Alternative considered**: Separate PostgreSQL schemas (`app` and `ml`) — cleaner isolation but adds schema-prefix complexity to queries and cross-schema references. Not worth it at this scale.
|
|
|
|
### 3. Rename database from `ml_db` to `candle_annotator`
|
|
|
|
**Choice**: Rename the PostgreSQL database to `candle_annotator` since it now serves the whole application, not just ML.
|
|
|
|
**Why**: `ml_db` is misleading when the database holds frontend data too. Renaming during consolidation is the natural time to do it.
|
|
|
|
**Alternative considered**: Keep `ml_db` — avoids a rename step but creates lasting confusion.
|
|
|
|
### 4. Fresh Drizzle migrations (drop SQLite migrations)
|
|
|
|
**Choice**: Delete all existing SQLite migrations in `drizzle/`, rewrite the schema file with `pgTable` equivalents, and run `drizzle-kit generate` to produce a fresh initial PostgreSQL migration.
|
|
|
|
**Why**: SQLite migrations are dialect-specific (e.g., `integer` for booleans, no native timestamps). Converting them one-by-one is fragile. A clean start from the PostgreSQL schema is simpler and produces idiomatic SQL.
|
|
|
|
**Alternative considered**: Manually converting each SQLite migration to PostgreSQL — error-prone and provides no benefit since there's no production data that needs incremental migration history.
|
|
|
|
### 5. Type mappings: SQLite → PostgreSQL
|
|
|
|
| SQLite type | PostgreSQL type | Notes |
|
|
|---|---|---|
|
|
| `integer` (PK, autoIncrement) | `serial` | Auto-incrementing integer |
|
|
| `integer` (timestamps) | `timestamp` | Use `defaultNow()` where applicable |
|
|
| `integer` (booleans like `is_active`) | `boolean` | True PostgreSQL booleans |
|
|
| `real` | `doublePrecision` | OHLC price data |
|
|
| `text` | `text` | No change |
|
|
| `text` (JSON strings) | `jsonb` | For `geometry`, `sub_spans`, `model_prediction` |
|
|
|
|
### 6. Connection management for Next.js
|
|
|
|
**Choice**: Use a connection pool via `pg.Pool` with `max: 10` connections. Connection string from `DATABASE_URL` env var.
|
|
|
|
**Why**: SQLite was single-file, no pooling needed. PostgreSQL requires connection pooling for concurrent API requests. 10 connections is reasonable for the frontend workload.
|
|
|
|
### 7. ML service direct access to frontend tables
|
|
|
|
**Choice**: The ML service reads frontend tables (candles, annotations, span_annotations) directly via SQLAlchemy using its existing connection. No new SQLAlchemy models needed — raw SQL queries or lightweight table reflections are sufficient for read-only access.
|
|
|
|
**Why**: The ML service only needs to read training data. Adding full SQLAlchemy models for tables owned by Drizzle creates a dual-ownership problem. Raw queries or `Table` reflections keep it simple.
|
|
|
|
## Risks / Trade-offs
|
|
|
|
**[Schema drift between Drizzle and SQLAlchemy]** → Both ORMs manage tables in the same database. Drizzle owns frontend tables, SQLAlchemy owns ML tables. Neither should modify the other's tables. This is enforced by convention, not tooling.
|
|
|
|
**[Connection pool exhaustion]** → Adding the frontend's database traffic to the same PostgreSQL instance increases load. Mitigation: PostgreSQL 16 handles far more concurrent connections than SQLite. The `pg.Pool` max of 10 plus SQLAlchemy's pool of 5 is well within PostgreSQL's default `max_connections` of 100.
|
|
|
|
**[Data loss during migration]** → SQLite data must be migrated before switching. Mitigation: Write a migration script that exports SQLite data and imports to PostgreSQL. Run before deploying the new code. Keep the SQLite file as backup.
|
|
|
|
**[Drizzle push/generate differences]** → PostgreSQL dialect may generate slightly different migration SQL than expected. Mitigation: Review generated migrations before applying. Use `drizzle-kit push` for development, `drizzle-kit generate` + `drizzle-kit migrate` for production.
|
|
|
|
**[Boolean conversion]** → SQLite uses `0/1` for booleans, PostgreSQL uses `true/false`. Mitigation: The migration script handles conversion. Drizzle's `boolean()` type handles this transparently at the ORM level going forward.
|
|
|
|
## Migration Plan
|
|
|
|
1. **Update schema and dependencies** — Rewrite Drizzle schema for PostgreSQL, swap npm packages
|
|
2. **Generate fresh migrations** — `drizzle-kit generate` from the new PostgreSQL schema
|
|
3. **Update docker-compose.yml** — Rename database, add frontend dependency on postgres, remove `candle-data` volume
|
|
4. **Update environment variables** — `DATABASE_URL` for the frontend service
|
|
5. **Write data migration script** — `scripts/migrate-sqlite-to-postgres.ts` that reads SQLite and inserts into PostgreSQL with type conversions
|
|
6. **Update db/index.ts** — Switch from `better-sqlite3` to `pg` pool, update migration runner
|
|
7. **Test locally** — Run migrations, migrate data, verify API routes work
|
|
8. **Deploy** — Stop current services, run PostgreSQL migrations, run data migration, deploy new code
|
|
9. **Rollback** — If issues arise, revert docker-compose and code, restore SQLite volume. The SQLite file is kept as backup for 1 week post-migration.
|
|
|
|
## Open Questions
|
|
|
|
- Should the ML service user (`ml_user`) have write access to frontend tables, or should we create a separate read-only role? (Recommendation: keep `ml_user` with full access for simplicity, revisit if the team grows.)
|
|
- Do we need to preserve SQLite migration history in git for reference, or delete the `drizzle/` folder contents entirely? (Recommendation: delete and start fresh.)
|