- Remove better-sqlite3, add pg driver - Convert schema to PostgreSQL types (serial, timestamp, boolean, jsonb) - Generate fresh PostgreSQL migrations - Update database connection layer with pg.Pool - Fix all API routes: remove JSON.parse/stringify, use native timestamps and booleans - Update drizzle.config.ts and .env.example for PostgreSQL
7.4 KiB
Context
The candle annotator runs two databases:
- SQLite (
data/candles.db) — serves the Next.js frontend via Drizzle ORM (better-sqlite3driver). Contains 6 tables: charts, candles, annotations, annotation_types, span_annotations, span_label_types. - PostgreSQL (
postgres:5432/ml_db) — serves the Python ML service via SQLAlchemy. Contains 1 table: training_runs.
The ML service cannot directly query annotation/candle data. Data flows through CSV/JSON file exports. PostgreSQL already runs in Docker for the ML service, so consolidating means adding frontend tables there — not introducing a new service.
Goals / Non-Goals
Goals:
- Single PostgreSQL instance for all application data
- Drizzle ORM continues to manage frontend schema (just switches dialect)
- ML service gains direct read access to candle/annotation tables
- Simplified Docker setup (one fewer volume, one database to back up)
- One-time data migration path from SQLite to PostgreSQL
Non-Goals:
- Changing the ML service ORM (SQLAlchemy stays)
- Merging Drizzle and SQLAlchemy migration systems (each manages its own tables)
- Changing API route logic or query patterns beyond what's needed for the dialect switch
- Multi-tenant or schema separation (all tables go in the
publicschema) - Migrating away from Drizzle ORM
Decisions
1. Drizzle PostgreSQL driver: drizzle-orm/node-postgres with pg
Choice: Use pg (node-postgres) as the driver.
Why: pg is the most mature PostgreSQL driver for Node.js. Drizzle supports it natively via drizzle-orm/node-postgres. The postgres (postgres.js) driver is also an option but pg has broader ecosystem support and is easier to debug.
Alternative considered: postgres (postgres.js) — lighter, promise-native, but less battle-tested with Drizzle migrations.
2. Shared database, single public schema
Choice: All tables (frontend + ML) live in the same database (ml_db) and the default public schema.
Why: The table sets don't overlap (frontend has charts/candles/annotations, ML has training_runs). Separate schemas add complexity with no benefit for 7 total tables. The ML service already connects to ml_db.
Alternative considered: Separate PostgreSQL schemas (app and ml) — cleaner isolation but adds schema-prefix complexity to queries and cross-schema references. Not worth it at this scale.
3. Rename database from ml_db to candle_annotator
Choice: Rename the PostgreSQL database to candle_annotator since it now serves the whole application, not just ML.
Why: ml_db is misleading when the database holds frontend data too. Renaming during consolidation is the natural time to do it.
Alternative considered: Keep ml_db — avoids a rename step but creates lasting confusion.
4. Fresh Drizzle migrations (drop SQLite migrations)
Choice: Delete all existing SQLite migrations in drizzle/, rewrite the schema file with pgTable equivalents, and run drizzle-kit generate to produce a fresh initial PostgreSQL migration.
Why: SQLite migrations are dialect-specific (e.g., integer for booleans, no native timestamps). Converting them one-by-one is fragile. A clean start from the PostgreSQL schema is simpler and produces idiomatic SQL.
Alternative considered: Manually converting each SQLite migration to PostgreSQL — error-prone and provides no benefit since there's no production data that needs incremental migration history.
5. Type mappings: SQLite → PostgreSQL
| SQLite type | PostgreSQL type | Notes |
|---|---|---|
integer (PK, autoIncrement) |
serial |
Auto-incrementing integer |
integer (timestamps) |
timestamp |
Use defaultNow() where applicable |
integer (booleans like is_active) |
boolean |
True PostgreSQL booleans |
real |
doublePrecision |
OHLC price data |
text |
text |
No change |
text (JSON strings) |
jsonb |
For geometry, sub_spans, model_prediction |
6. Connection management for Next.js
Choice: Use a connection pool via pg.Pool with max: 10 connections. Connection string from DATABASE_URL env var.
Why: SQLite was single-file, no pooling needed. PostgreSQL requires connection pooling for concurrent API requests. 10 connections is reasonable for the frontend workload.
7. ML service direct access to frontend tables
Choice: The ML service reads frontend tables (candles, annotations, span_annotations) directly via SQLAlchemy using its existing connection. No new SQLAlchemy models needed — raw SQL queries or lightweight table reflections are sufficient for read-only access.
Why: The ML service only needs to read training data. Adding full SQLAlchemy models for tables owned by Drizzle creates a dual-ownership problem. Raw queries or Table reflections keep it simple.
Risks / Trade-offs
[Schema drift between Drizzle and SQLAlchemy] → Both ORMs manage tables in the same database. Drizzle owns frontend tables, SQLAlchemy owns ML tables. Neither should modify the other's tables. This is enforced by convention, not tooling.
[Connection pool exhaustion] → Adding the frontend's database traffic to the same PostgreSQL instance increases load. Mitigation: PostgreSQL 16 handles far more concurrent connections than SQLite. The pg.Pool max of 10 plus SQLAlchemy's pool of 5 is well within PostgreSQL's default max_connections of 100.
[Data loss during migration] → SQLite data must be migrated before switching. Mitigation: Write a migration script that exports SQLite data and imports to PostgreSQL. Run before deploying the new code. Keep the SQLite file as backup.
[Drizzle push/generate differences] → PostgreSQL dialect may generate slightly different migration SQL than expected. Mitigation: Review generated migrations before applying. Use drizzle-kit push for development, drizzle-kit generate + drizzle-kit migrate for production.
[Boolean conversion] → SQLite uses 0/1 for booleans, PostgreSQL uses true/false. Mitigation: The migration script handles conversion. Drizzle's boolean() type handles this transparently at the ORM level going forward.
Migration Plan
- Update schema and dependencies — Rewrite Drizzle schema for PostgreSQL, swap npm packages
- Generate fresh migrations —
drizzle-kit generatefrom the new PostgreSQL schema - Update docker-compose.yml — Rename database, add frontend dependency on postgres, remove
candle-datavolume - Update environment variables —
DATABASE_URLfor the frontend service - Write data migration script —
scripts/migrate-sqlite-to-postgres.tsthat reads SQLite and inserts into PostgreSQL with type conversions - Update db/index.ts — Switch from
better-sqlite3topgpool, update migration runner - Test locally — Run migrations, migrate data, verify API routes work
- Deploy — Stop current services, run PostgreSQL migrations, run data migration, deploy new code
- Rollback — If issues arise, revert docker-compose and code, restore SQLite volume. The SQLite file is kept as backup for 1 week post-migration.
Open Questions
- Should the ML service user (
ml_user) have write access to frontend tables, or should we create a separate read-only role? (Recommendation: keepml_userwith full access for simplicity, revisit if the team grows.) - Do we need to preserve SQLite migration history in git for reference, or delete the
drizzle/folder contents entirely? (Recommendation: delete and start fresh.)