fix: resolve numpy type conversion issues in ML service data access

- Convert numpy.int64 to Python int before passing to SQLAlchemy queries
- Prevents psycopg2.ProgrammingError: can't adapt type 'numpy.int64'
- Applied to get_candles(), get_span_annotations(), and get_point_annotations()
- All ML service database access tests now passing successfully
This commit is contained in:
Marko Djordjevic 2026-02-17 14:10:21 +01:00
parent 5377431c9d
commit d1557a3846
6 changed files with 437 additions and 119 deletions

134
README.md
View file

@ -33,7 +33,8 @@ Candle Annotator is a complete machine learning platform for candlestick pattern
- **CSV Upload**: Import OHLC data with support for both Unix timestamps and date strings
- **Replace Mode**: Uploading a new CSV deletes all old candles and replaces them with new data
- **Initial Data**: Docker containers automatically load EURUSD.csv on first startup if database is empty
- **SQLite Storage**: All candle data and annotations stored locally in SQLite database
- **PostgreSQL Storage**: All candle data and annotations stored in PostgreSQL database
- **Shared Database**: Frontend and ML service use the same database for seamless data access
- **Data Persistence**: Annotations and candles persist between sessions
### Chart Visualization
@ -134,21 +135,23 @@ The integrated ML pipeline provides:
│ - /api/predict (proxy) │
│ - /api/model/info (proxy) │
│ └───────────┬─────────────────────────────────────┬─────────── │
│ │ SQLite (annotations) │ HTTP │
│ │ PostgreSQL │ HTTP │
│ ▼ ▼ │
│ ┌──────────────────┐ ┌──────────────────────┐ │
│ │ SQLite Database │ │ ML Inference API │ │
│ │ (Drizzle ORM) │ │ (FastAPI, Python) │ │
│ └──────────────────┘ └──────────┬───────────┘ │
└────────────────────────────────────────────────────┼──────────────┘
┌──────────────────────────────┴───────────────┐
│ │
┌───────────▼─────────┐ ┌──────────────▼────────┐
│ MLflow Server │ │ PostgreSQL │
│ (Experiments, │ │ (Training run │
│ Model Registry) │ │ metadata) │
└─────────────────────┘ └───────────────────────┘
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ PostgreSQL Database (Shared) │ │
│ │ - Frontend tables (candles, annotations, span_annotations) │
│ │ - ML tables (training_runs) │ │
│ │ - Accessed by: Next.js (Drizzle ORM) │ │
│ │ ML Service (SQLAlchemy) │ │
│ └──────────────────────┬──────────────────────────────────┘ │
│ │ │
│ ┌──────────┴──────────┐ │
│ │ │ │
│ ┌───────────▼─────────┐ ┌───────▼───────────────────┐ │
│ │ ML Inference API │ │ MLflow Server │ │
│ │ (FastAPI, Python) │ │ (Experiments, Registry) │ │
│ └─────────────────────┘ └───────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
### ML Pipeline Workflow
@ -189,8 +192,8 @@ The integrated ML pipeline provides:
- **Charting**: lightweight-charts 4.x (TradingView)
- **Icons**: lucide-react
- **Backend**: Next.js API Routes
- **Database**: SQLite with better-sqlite3
- **ORM**: Drizzle ORM
- **Database**: PostgreSQL 16 with pg driver
- **ORM**: Drizzle ORM (PostgreSQL dialect)
- **CSV Parsing**: papaparse
### ML Pipeline (Python)
@ -200,7 +203,8 @@ The integrated ML pipeline provides:
- **Data Processing**: pandas, numpy
- **Experiment Tracking**: MLflow (model registry, artifact storage)
- **Data Versioning**: DVC (Data Version Control)
- **Database**: PostgreSQL 16 (training run metadata)
- **Database**: PostgreSQL 16 (shared with frontend - reads candles/annotations, writes training runs)
- **ORM**: SQLAlchemy (for training runs) + table reflection (for frontend data)
- **Model Persistence**: joblib
- **Validation**: Pydantic
@ -222,8 +226,8 @@ See [DEPLOYMENT.md](./DEPLOYMENT.md#docker-deployment) for detailed Docker instr
- Node.js 18.x or higher (for local development)
- npm 9.x or higher (for local development)
- PostgreSQL 16 or higher (for local development)
- Docker & docker-compose (for containerized deployment)
- Build tools for native modules (see DEPLOYMENT.md)
### Local Development Installation
@ -238,12 +242,26 @@ See [DEPLOYMENT.md](./DEPLOYMENT.md#docker-deployment) for detailed Docker instr
npm install
```
3. Start the development server:
3. Setup PostgreSQL database:
```bash
createdb candle_annotator
createuser -P ml_user
# Enter password: ml_password
psql -c "GRANT ALL PRIVILEGES ON DATABASE candle_annotator TO ml_user;"
```
4. Create `.env` file:
```bash
cp .env.example .env
# Edit .env to set DATABASE_URL=postgresql://ml_user:ml_password@localhost:5432/candle_annotator
```
5. Start the development server:
```bash
npm run dev
```
4. Open http://localhost:3000 in your browser
6. Open http://localhost:3000 in your browser
### Usage
@ -289,28 +307,68 @@ timestamp,label_type,price
## Database Schema
### Candles Table
### Candles Table (PostgreSQL)
```typescript
{
id: integer (PK, auto-increment),
time: integer (Unix timestamp, unique),
open: real,
high: real,
low: real,
close: real
id: serial (PK, auto-increment),
chart_id: integer (FK to charts.id),
time: timestamp (not null, indexed with chart_id),
open: double precision,
high: double precision,
low: double precision,
close: double precision
}
```
### Annotations Table
### Annotations Table (Point Annotations)
```typescript
{
id: integer (PK, auto-increment),
timestamp: integer (Unix timestamp),
label_type: text ('break_up' | 'break_down' | 'line'),
geometry: text (JSON string for line coordinates, null for markers),
created_at: integer (Unix timestamp)
id: serial (PK, auto-increment),
chart_id: integer (FK to charts.id),
timestamp: timestamp (not null),
label_type: text ('line' | 'rectangle'),
geometry: jsonb (for line/rectangle coordinates, nullable),
color: text (default '#3b82f6'),
created_at: timestamp (default now())
}
```
### Span Annotations Table (Pattern Labels)
```typescript
{
id: serial (PK, auto-increment),
chart_id: integer (FK to charts.id),
start_time: timestamp (not null),
end_time: timestamp (not null),
label: text (pattern name, e.g., 'Bullish Engulfing'),
confidence: integer (nullable),
outcome: text (nullable),
notes: text (nullable),
sub_spans: jsonb (nullable),
color: text (default '#2196F3'),
source: text (default 'human'), # 'human' | 'model' | 'hybrid'
model_prediction: jsonb (nullable),
created_at: timestamp (default now())
}
```
### Training Runs Table (ML Service)
```typescript
{
id: serial (PK, auto-increment),
run_id: text (unique, MLflow run ID),
model_type: text (e.g., 'RandomForest', 'XGBoost'),
experiment_name: text,
pipeline_config_hash: text,
dataset_version: text,
metrics_summary: jsonb,
status: text (e.g., 'running', 'completed', 'failed'),
created_at: timestamp (default now()),
completed_at: timestamp (nullable)
}
```
@ -411,9 +469,9 @@ candle_annotator/
### Key Technical Decisions
1. **lightweight-charts v4**: Stable API with good candlestick and marker support
2. **SQLite with better-sqlite3**: Synchronous access, perfect for single-user local apps
2. **PostgreSQL**: Shared database enables ML service to directly query candle/annotation data without CSV exports
3. **SVG Overlay for Lines**: Maintains separate rendering layer from chart, easier coordinate management
4. **Drizzle ORM**: Type-safe queries with minimal overhead
4. **Drizzle ORM**: Type-safe queries with minimal overhead, PostgreSQL dialect for production-grade features
5. **Next.js App Router**: Server-side API routes co-located with frontend code
### Known Limitations
@ -428,9 +486,9 @@ candle_annotator/
See [DEPLOYMENT.md](./DEPLOYMENT.md) for detailed troubleshooting steps.
Common issues:
- **better-sqlite3 binding errors**: Run `npm rebuild better-sqlite3`
- **PostgreSQL connection errors**: Check `DATABASE_URL` environment variable and verify PostgreSQL is running
- **Port 3000 in use**: Use `PORT=3001 npm run dev`
- **Database corruption**: Delete `data/candles.db` and restart
- **Migration errors**: Ensure PostgreSQL is accessible before starting the application
## License