## Context The candle annotator runs two databases: 1. **SQLite** (`data/candles.db`) — serves the Next.js frontend via Drizzle ORM (`better-sqlite3` driver). Contains 6 tables: charts, candles, annotations, annotation_types, span_annotations, span_label_types. 2. **PostgreSQL** (`postgres:5432/ml_db`) — serves the Python ML service via SQLAlchemy. Contains 1 table: training_runs. The ML service cannot directly query annotation/candle data. Data flows through CSV/JSON file exports. PostgreSQL already runs in Docker for the ML service, so consolidating means adding frontend tables there — not introducing a new service. ## Goals / Non-Goals **Goals:** - Single PostgreSQL instance for all application data - Drizzle ORM continues to manage frontend schema (just switches dialect) - ML service gains direct read access to candle/annotation tables - Simplified Docker setup (one fewer volume, one database to back up) - One-time data migration path from SQLite to PostgreSQL **Non-Goals:** - Changing the ML service ORM (SQLAlchemy stays) - Merging Drizzle and SQLAlchemy migration systems (each manages its own tables) - Changing API route logic or query patterns beyond what's needed for the dialect switch - Multi-tenant or schema separation (all tables go in the `public` schema) - Migrating away from Drizzle ORM ## Decisions ### 1. Drizzle PostgreSQL driver: `drizzle-orm/node-postgres` with `pg` **Choice**: Use `pg` (node-postgres) as the driver. **Why**: `pg` is the most mature PostgreSQL driver for Node.js. Drizzle supports it natively via `drizzle-orm/node-postgres`. The `postgres` (postgres.js) driver is also an option but `pg` has broader ecosystem support and is easier to debug. **Alternative considered**: `postgres` (postgres.js) — lighter, promise-native, but less battle-tested with Drizzle migrations. ### 2. Shared database, single `public` schema **Choice**: All tables (frontend + ML) live in the same database (`ml_db`) and the default `public` schema. **Why**: The table sets don't overlap (frontend has charts/candles/annotations, ML has training_runs). Separate schemas add complexity with no benefit for 7 total tables. The ML service already connects to `ml_db`. **Alternative considered**: Separate PostgreSQL schemas (`app` and `ml`) — cleaner isolation but adds schema-prefix complexity to queries and cross-schema references. Not worth it at this scale. ### 3. Rename database from `ml_db` to `candle_annotator` **Choice**: Rename the PostgreSQL database to `candle_annotator` since it now serves the whole application, not just ML. **Why**: `ml_db` is misleading when the database holds frontend data too. Renaming during consolidation is the natural time to do it. **Alternative considered**: Keep `ml_db` — avoids a rename step but creates lasting confusion. ### 4. Fresh Drizzle migrations (drop SQLite migrations) **Choice**: Delete all existing SQLite migrations in `drizzle/`, rewrite the schema file with `pgTable` equivalents, and run `drizzle-kit generate` to produce a fresh initial PostgreSQL migration. **Why**: SQLite migrations are dialect-specific (e.g., `integer` for booleans, no native timestamps). Converting them one-by-one is fragile. A clean start from the PostgreSQL schema is simpler and produces idiomatic SQL. **Alternative considered**: Manually converting each SQLite migration to PostgreSQL — error-prone and provides no benefit since there's no production data that needs incremental migration history. ### 5. Type mappings: SQLite → PostgreSQL | SQLite type | PostgreSQL type | Notes | |---|---|---| | `integer` (PK, autoIncrement) | `serial` | Auto-incrementing integer | | `integer` (timestamps) | `timestamp` | Use `defaultNow()` where applicable | | `integer` (booleans like `is_active`) | `boolean` | True PostgreSQL booleans | | `real` | `doublePrecision` | OHLC price data | | `text` | `text` | No change | | `text` (JSON strings) | `jsonb` | For `geometry`, `sub_spans`, `model_prediction` | ### 6. Connection management for Next.js **Choice**: Use a connection pool via `pg.Pool` with `max: 10` connections. Connection string from `DATABASE_URL` env var. **Why**: SQLite was single-file, no pooling needed. PostgreSQL requires connection pooling for concurrent API requests. 10 connections is reasonable for the frontend workload. ### 7. ML service direct access to frontend tables **Choice**: The ML service reads frontend tables (candles, annotations, span_annotations) directly via SQLAlchemy using its existing connection. No new SQLAlchemy models needed — raw SQL queries or lightweight table reflections are sufficient for read-only access. **Why**: The ML service only needs to read training data. Adding full SQLAlchemy models for tables owned by Drizzle creates a dual-ownership problem. Raw queries or `Table` reflections keep it simple. ## Risks / Trade-offs **[Schema drift between Drizzle and SQLAlchemy]** → Both ORMs manage tables in the same database. Drizzle owns frontend tables, SQLAlchemy owns ML tables. Neither should modify the other's tables. This is enforced by convention, not tooling. **[Connection pool exhaustion]** → Adding the frontend's database traffic to the same PostgreSQL instance increases load. Mitigation: PostgreSQL 16 handles far more concurrent connections than SQLite. The `pg.Pool` max of 10 plus SQLAlchemy's pool of 5 is well within PostgreSQL's default `max_connections` of 100. **[Data loss during migration]** → SQLite data must be migrated before switching. Mitigation: Write a migration script that exports SQLite data and imports to PostgreSQL. Run before deploying the new code. Keep the SQLite file as backup. **[Drizzle push/generate differences]** → PostgreSQL dialect may generate slightly different migration SQL than expected. Mitigation: Review generated migrations before applying. Use `drizzle-kit push` for development, `drizzle-kit generate` + `drizzle-kit migrate` for production. **[Boolean conversion]** → SQLite uses `0/1` for booleans, PostgreSQL uses `true/false`. Mitigation: The migration script handles conversion. Drizzle's `boolean()` type handles this transparently at the ORM level going forward. ## Migration Plan 1. **Update schema and dependencies** — Rewrite Drizzle schema for PostgreSQL, swap npm packages 2. **Generate fresh migrations** — `drizzle-kit generate` from the new PostgreSQL schema 3. **Update docker-compose.yml** — Rename database, add frontend dependency on postgres, remove `candle-data` volume 4. **Update environment variables** — `DATABASE_URL` for the frontend service 5. **Write data migration script** — `scripts/migrate-sqlite-to-postgres.ts` that reads SQLite and inserts into PostgreSQL with type conversions 6. **Update db/index.ts** — Switch from `better-sqlite3` to `pg` pool, update migration runner 7. **Test locally** — Run migrations, migrate data, verify API routes work 8. **Deploy** — Stop current services, run PostgreSQL migrations, run data migration, deploy new code 9. **Rollback** — If issues arise, revert docker-compose and code, restore SQLite volume. The SQLite file is kept as backup for 1 week post-migration. ## Open Questions - Should the ML service user (`ml_user`) have write access to frontend tables, or should we create a separate read-only role? (Recommendation: keep `ml_user` with full access for simplicity, revisit if the team grows.) - Do we need to preserve SQLite migration history in git for reference, or delete the `drizzle/` folder contents entirely? (Recommendation: delete and start fresh.)