fix: resolve numpy type conversion issues in ML service data access

- Convert numpy.int64 to Python int before passing to SQLAlchemy queries - Prevents psycopg2.ProgrammingError: can't adapt type 'numpy.int64' - Applied to get_candles(), get_span_annotations(), and get_point_annotations() - All ML service database access tests now passing successfully
2026-02-17 14:10:21 +01:00 · 2026-02-17 14:10:21 +01:00 · d1557a3846
commit d1557a3846
parent 5377431c9d
6 changed files with 437 additions and 119 deletions
--- a/README.md
+++ b/README.md
@ -33,7 +33,8 @@ Candle Annotator is a complete machine learning platform for candlestick pattern
 - **CSV Upload**: Import OHLC data with support for both Unix timestamps and date strings
  - **Replace Mode**: Uploading a new CSV deletes all old candles and replaces them with new data
  - **Initial Data**: Docker containers automatically load EURUSD.csv on first startup if database is empty
- **SQLite Storage**: All candle data and annotations stored locally in SQLite database
+- **PostgreSQL Storage**: All candle data and annotations stored in PostgreSQL database
+- **Shared Database**: Frontend and ML service use the same database for seamless data access
 - **Data Persistence**: Annotations and candles persist between sessions

 ### Chart Visualization
@ -134,21 +135,23 @@ The integrated ML pipeline provides:
 │  - /api/predict (proxy)                                          │
 │  - /api/model/info (proxy)                                       │
 │  └───────────┬─────────────────────────────────────┬───────────  │
-│              │ SQLite (annotations)                │ HTTP        │
+│              │ PostgreSQL                          │ HTTP        │
 │              ▼                                     ▼             │
-│  ┌──────────────────┐                 ┌──────────────────────┐  │
-│  │  SQLite Database │                 │  ML Inference API    │  │
-│  │  (Drizzle ORM)   │                 │  (FastAPI, Python)   │  │
-│  └──────────────────┘                 └──────────┬───────────┘  │
-└────────────────────────────────────────────────────┼──────────────┘
-                                                     │
-                      ┌──────────────────────────────┴───────────────┐
-                      │                                              │
-          ┌───────────▼─────────┐                    ┌──────────────▼────────┐
-          │  MLflow Server      │                    │  PostgreSQL           │
-          │  (Experiments,      │                    │  (Training run        │
-          │   Model Registry)   │                    │   metadata)           │
-          └─────────────────────┘                    └───────────────────────┘
+│  ┌──────────────────────────────────────────────────────────┐   │
+│  │          PostgreSQL Database (Shared)                    │   │
+│  │  - Frontend tables (candles, annotations, span_annotations) │
+│  │  - ML tables (training_runs)                             │   │
+│  │  - Accessed by: Next.js (Drizzle ORM)                    │   │
+│  │                 ML Service (SQLAlchemy)                  │   │
+│  └──────────────────────┬──────────────────────────────────┘    │
+│                         │                                        │
+│              ┌──────────┴──────────┐                             │
+│              │                     │                             │
+│  ┌───────────▼─────────┐  ┌───────▼───────────────────┐         │
+│  │  ML Inference API   │  │    MLflow Server          │         │
+│  │  (FastAPI, Python)  │  │  (Experiments, Registry)  │         │
+│  └─────────────────────┘  └───────────────────────────┘         │
+└─────────────────────────────────────────────────────────────────┘
 ```

 ### ML Pipeline Workflow
@ -189,8 +192,8 @@ The integrated ML pipeline provides:
 - **Charting**: lightweight-charts 4.x (TradingView)
 - **Icons**: lucide-react
 - **Backend**: Next.js API Routes
- **Database**: SQLite with better-sqlite3
- **ORM**: Drizzle ORM
+- **Database**: PostgreSQL 16 with pg driver
+- **ORM**: Drizzle ORM (PostgreSQL dialect)
 - **CSV Parsing**: papaparse

 ### ML Pipeline (Python)
@ -200,7 +203,8 @@ The integrated ML pipeline provides:
 - **Data Processing**: pandas, numpy
 - **Experiment Tracking**: MLflow (model registry, artifact storage)
 - **Data Versioning**: DVC (Data Version Control)
- **Database**: PostgreSQL 16 (training run metadata)
+- **Database**: PostgreSQL 16 (shared with frontend - reads candles/annotations, writes training runs)
+- **ORM**: SQLAlchemy (for training runs) + table reflection (for frontend data)
 - **Model Persistence**: joblib
 - **Validation**: Pydantic

@ -222,8 +226,8 @@ See [DEPLOYMENT.md](./DEPLOYMENT.md#docker-deployment) for detailed Docker instr

 - Node.js 18.x or higher (for local development)
 - npm 9.x or higher (for local development)
+- PostgreSQL 16 or higher (for local development)
 - Docker & docker-compose (for containerized deployment)
- Build tools for native modules (see DEPLOYMENT.md)

 ### Local Development Installation

@ -238,12 +242,26 @@ See [DEPLOYMENT.md](./DEPLOYMENT.md#docker-deployment) for detailed Docker instr
   npm install
   ```

-3. Start the development server:
+3. Setup PostgreSQL database:
+   ```bash
+   createdb candle_annotator
+   createuser -P ml_user
+   # Enter password: ml_password
+   psql -c "GRANT ALL PRIVILEGES ON DATABASE candle_annotator TO ml_user;"
+   ```
+
+4. Create `.env` file:
+   ```bash
+   cp .env.example .env
+   # Edit .env to set DATABASE_URL=postgresql://ml_user:ml_password@localhost:5432/candle_annotator
+   ```
+
+5. Start the development server:
   ```bash
   npm run dev
   ```

-4. Open http://localhost:3000 in your browser
+6. Open http://localhost:3000 in your browser

 ### Usage

@ -289,28 +307,68 @@ timestamp,label_type,price

 ## Database Schema

-### Candles Table
+### Candles Table (PostgreSQL)

 ```typescript
 {
-  id: integer (PK, auto-increment),
-  time: integer (Unix timestamp, unique),
-  open: real,
-  high: real,
-  low: real,
-  close: real
+  id: serial (PK, auto-increment),
+  chart_id: integer (FK to charts.id),
+  time: timestamp (not null, indexed with chart_id),
+  open: double precision,
+  high: double precision,
+  low: double precision,
+  close: double precision
 }
 ```

-### Annotations Table
+### Annotations Table (Point Annotations)

 ```typescript
 {
-  id: integer (PK, auto-increment),
-  timestamp: integer (Unix timestamp),
-  label_type: text ('break_up' | 'break_down' | 'line'),
-  geometry: text (JSON string for line coordinates, null for markers),
-  created_at: integer (Unix timestamp)
+  id: serial (PK, auto-increment),
+  chart_id: integer (FK to charts.id),
+  timestamp: timestamp (not null),
+  label_type: text ('line' | 'rectangle'),
+  geometry: jsonb (for line/rectangle coordinates, nullable),
+  color: text (default '#3b82f6'),
+  created_at: timestamp (default now())
+}
+```
+
+### Span Annotations Table (Pattern Labels)
+
+```typescript
+{
+  id: serial (PK, auto-increment),
+  chart_id: integer (FK to charts.id),
+  start_time: timestamp (not null),
+  end_time: timestamp (not null),
+  label: text (pattern name, e.g., 'Bullish Engulfing'),
+  confidence: integer (nullable),
+  outcome: text (nullable),
+  notes: text (nullable),
+  sub_spans: jsonb (nullable),
+  color: text (default '#2196F3'),
+  source: text (default 'human'),  # 'human' | 'model' | 'hybrid'
+  model_prediction: jsonb (nullable),
+  created_at: timestamp (default now())
+}
+```
+
+### Training Runs Table (ML Service)
+
+```typescript
+{
+  id: serial (PK, auto-increment),
+  run_id: text (unique, MLflow run ID),
+  model_type: text (e.g., 'RandomForest', 'XGBoost'),
+  experiment_name: text,
+  pipeline_config_hash: text,
+  dataset_version: text,
+  metrics_summary: jsonb,
+  status: text (e.g., 'running', 'completed', 'failed'),
+  created_at: timestamp (default now()),
+  completed_at: timestamp (nullable)
 }
 ```

@ -411,9 +469,9 @@ candle_annotator/
 ### Key Technical Decisions

 1. **lightweight-charts v4**: Stable API with good candlestick and marker support
-2. **SQLite with better-sqlite3**: Synchronous access, perfect for single-user local apps
+2. **PostgreSQL**: Shared database enables ML service to directly query candle/annotation data without CSV exports
 3. **SVG Overlay for Lines**: Maintains separate rendering layer from chart, easier coordinate management
-4. **Drizzle ORM**: Type-safe queries with minimal overhead
+4. **Drizzle ORM**: Type-safe queries with minimal overhead, PostgreSQL dialect for production-grade features
 5. **Next.js App Router**: Server-side API routes co-located with frontend code

 ### Known Limitations
@ -428,9 +486,9 @@ candle_annotator/
 See [DEPLOYMENT.md](./DEPLOYMENT.md) for detailed troubleshooting steps.

 Common issues:
- **better-sqlite3 binding errors**: Run `npm rebuild better-sqlite3`
+- **PostgreSQL connection errors**: Check `DATABASE_URL` environment variable and verify PostgreSQL is running
 - **Port 3000 in use**: Use `PORT=3001 npm run dev`
- **Database corruption**: Delete `data/candles.db` and restart
+- **Migration errors**: Ensure PostgreSQL is accessible before starting the application

 ## License