fix: resolve numpy type conversion issues in ML service data access

- Convert numpy.int64 to Python int before passing to SQLAlchemy queries
- Prevents psycopg2.ProgrammingError: can't adapt type 'numpy.int64'
- Applied to get_candles(), get_span_annotations(), and get_point_annotations()
- All ML service database access tests now passing successfully
This commit is contained in:
Marko Djordjevic 2026-02-17 14:10:21 +01:00
parent 5377431c9d
commit d1557a3846
6 changed files with 437 additions and 119 deletions

View file

@ -4,9 +4,28 @@
Candle Annotator is a complete machine learning platform for candlestick pattern recognition, combining manual annotation tools with an integrated Python ML pipeline. It's designed for traders and ML researchers to create labeled training data, train models, and deploy pattern recognition in an active learning loop.
**Current Version**: 3.0.0 (ML Pipeline Integration - Feature engineering, training, and inference service)
**Current Version**: 3.1.0 (Database Consolidation - PostgreSQL for all application data)
## Recent Changes (v3.0.0)
## Recent Changes (v3.1.0)
### Database Consolidation to PostgreSQL
- **Unified Database**: Migrated from SQLite to PostgreSQL for all application data (frontend + ML service)
- **Shared Data Access**: ML service can now directly query candle and annotation tables from PostgreSQL
- **No More CSV Exports**: ML service reads training data directly from database, eliminating export/import steps
- **Schema Updates**: All tables converted to PostgreSQL types (serial, timestamp, double precision, jsonb)
- **Type Safety**: Fixed numpy type conversion issues in ML service data access layer
- **Migration Script**: One-time migration script (`scripts/migrate-sqlite-to-postgres.ts`) for upgrading from SQLite
- **Rollback Support**: Documented rollback procedure if migration fails
### Backend Infrastructure Updates
- **Drizzle ORM**: Updated to PostgreSQL dialect with pg driver
- **Database Connection**: Connection pooling (max: 10 connections) for Next.js API routes
- **ML Service ORM**: SQLAlchemy table reflections for read-only access to frontend tables
- **Environment Variables**: `DATABASE_URL` replaces `DATABASE_PATH`
- **Docker Compose**: Updated with shared PostgreSQL instance, removed `candle-data` volume
- **Health Checks**: Enhanced health endpoints verify PostgreSQL connectivity
## Previous Changes (v3.0.0)
### ML Pipeline Integration
- **Python ML Service**: FastAPI-based inference service for real-time pattern prediction
@ -72,8 +91,8 @@ Candle Annotator is a complete machine learning platform for candlestick pattern
### Backend (Next.js)
- **Runtime**: Node.js 18.x
- **API**: Next.js API Routes
- **Database**: SQLite with better-sqlite3
- **ORM**: Drizzle ORM
- **Database**: PostgreSQL 16 with pg driver
- **ORM**: Drizzle ORM (PostgreSQL dialect)
- **CSV**: papaparse
### ML Service (Python)
@ -83,7 +102,8 @@ Candle Annotator is a complete machine learning platform for candlestick pattern
- **Data Processing**: pandas, numpy
- **Experiment Tracking**: MLflow (model registry, artifact storage)
- **Data Versioning**: DVC
- **Database**: PostgreSQL 16 (training run metadata)
- **Database**: PostgreSQL 16 (shared with frontend - reads candles/annotations, writes training runs)
- **ORM**: SQLAlchemy (for training runs) + table reflection (for frontend data)
- **Model Persistence**: joblib
- **Validation**: Pydantic
@ -103,8 +123,9 @@ Candle Annotator is a complete machine learning platform for candlestick pattern
### Data Management
- **CSV Upload**: Import OHLC data (time, open, high, low, close)
- **Persistent Storage**: SQLite database stores all annotations
- **CSV Export**: Download labeled data for ML training
- **Persistent Storage**: PostgreSQL database stores all annotations and candles
- **Shared Database**: ML service directly queries candle/annotation data
- **CSV Export**: Download labeled data (optional, no longer required for ML training)
### UI Components
- **Toolbox**: Left sidebar with tool buttons, color picker, label management
@ -274,7 +295,7 @@ candle_annotator/
- **Single User**: No authentication, local data only
- **No Undo**: Annotations can only be deleted, not undone
- **SQLite Limits**: Not for concurrent multi-user access
- **PostgreSQL Required**: Application requires PostgreSQL server to be running
- **Memory**: Large CSV files (100k+ rows) slow performance
- **Lines**: Free-form drawing, no snap-to-candle
@ -322,14 +343,28 @@ Before marking features complete:
## Version History
### v2.0.0 (Current)
### v3.1.0 (Current)
- Database consolidation to PostgreSQL
- Shared database between frontend and ML service
- Direct database access for ML training (no CSV exports)
- Migration script for SQLite to PostgreSQL
- Type safety improvements in ML service
### v3.0.0
- ML pipeline integration with FastAPI inference service
- Feature engineering with TA-Lib
- Model training with MLflow tracking
- Prediction UI with disagreement detection
- Active learning loop
### v2.0.0
- Label management sidebar with search/filter
- Hacker theme with matrix colors and monospace font
- Docker deployment with compose
- Keyboard delete for labels
- Label selection on chart
### v1.0.0 (Previous)
### v1.0.0
- Basic annotation tools (Break Up, Break Down, Line, Delete)
- Candlestick chart visualization
- CSV import/export

View file

@ -4,7 +4,7 @@
- Node.js 18.x or higher
- npm 9.x or higher
- Python and build tools (for native module compilation)
- PostgreSQL 16 or higher
## Local Development Setup
@ -14,32 +14,37 @@
npm install
```
Note: The `better-sqlite3` package requires native compilation. If you encounter build errors, ensure you have the necessary build tools:
**Linux:**
```bash
sudo apt-get install build-essential python3
```
**macOS:**
```bash
xcode-select --install
```
**Windows:**
```bash
npm install --global windows-build-tools
```
### 2. Database Setup
The SQLite database will be automatically created when you start the application. The database file is located at:
#### PostgreSQL Setup
```
./data/candles.db
The application uses PostgreSQL for all data storage. Set up the database:
```bash
# Create database
createdb candle_annotator
# Create user (if needed)
createuser -P ml_user
# Enter password: ml_password
# Grant privileges
psql -c "GRANT ALL PRIVILEGES ON DATABASE candle_annotator TO ml_user;"
```
To run migrations manually:
#### Environment Configuration
Create a `.env` file in the project root:
```env
DATABASE_URL=postgresql://ml_user:ml_password@localhost:5432/candle_annotator
NODE_ENV=development
PORT=3000
```
#### Run Migrations
Database migrations run automatically on application startup. To run manually:
```bash
npx drizzle-kit generate
@ -77,14 +82,30 @@ time,open,high,low,close
- Unix timestamp (seconds): `1700000000`
- Date string: `2024-01-15`
### 4. Migrating from SQLite (if applicable)
If you have existing data in an SQLite database from a previous version, use the migration script:
```bash
# Run the migration script
npm run migrate:sqlite-to-postgres
# Or with TypeScript directly
npx ts-node scripts/migrate-sqlite-to-postgres.ts
```
This script will:
- Read all data from the SQLite database (`data/candles.db`)
- Convert data types (timestamps, booleans, JSON→jsonb)
- Insert data into PostgreSQL
- Skip if run multiple times (idempotent)
## Building for Production
```bash
npm run build
```
Note: Production builds with `better-sqlite3` require the native module to be compiled for the target platform.
## Running Production Build
```bash
@ -96,33 +117,37 @@ The production server will run on port 3000 by default.
## Troubleshooting
### better-sqlite3 Build Issues
### Database Connection Issues
If you encounter errors related to `better-sqlite3` not finding bindings:
If the application fails to connect to PostgreSQL:
1. Rebuild the module:
1. Verify PostgreSQL is running:
```bash
npm rebuild better-sqlite3
pg_isready -h localhost -p 5432
```
2. If that fails, reinstall:
2. Check DATABASE_URL environment variable:
```bash
npm uninstall better-sqlite3
npm install better-sqlite3
echo $DATABASE_URL
```
3. For development, you can use `npm run dev` which handles the module better than production builds.
3. Verify credentials:
```bash
psql -U ml_user -d candle_annotator
```
### Database Issues
If the database becomes corrupted or you want to start fresh:
If you want to reset the database:
1. Stop the application
2. Delete the database file:
2. Drop and recreate the database:
```bash
rm -f data/candles.db
dropdb candle_annotator
createdb candle_annotator
psql -c "GRANT ALL PRIVILEGES ON DATABASE candle_annotator TO ml_user;"
```
3. Restart the application (it will recreate the database)
3. Restart the application (migrations will run automatically)
### Port Already in Use
@ -134,7 +159,18 @@ PORT=3001 npm run dev
## Environment Variables
The application doesn't require any environment variables for local development. All configuration is hardcoded for simplicity.
Required environment variables:
- `DATABASE_URL` - PostgreSQL connection string (e.g., `postgresql://ml_user:ml_password@localhost:5432/candle_annotator`)
- `NODE_ENV` - Environment (`development` or `production`)
- `PORT` - Server port (default: 3000)
Optional variables for ML inference:
- `INFERENCE_API_URL` - ML service endpoint (default: `http://localhost:8001`)
- `INFERENCE_API_TIMEOUT` - Request timeout in ms (default: 30000)
- `INFERENCE_BATCH_TIMEOUT` - Batch processing timeout in ms (default: 120000)
- `NEXT_PUBLIC_PREDICTIONS_ENABLED` - Enable predictions UI (default: true)
## File Structure
@ -202,15 +238,7 @@ uv sync
#### 3. Setup PostgreSQL
The ML service requires PostgreSQL for storing training run metadata:
```bash
# Create database
createdb ml_db
# Or using psql
psql -c "CREATE DATABASE ml_db;"
```
The ML service shares the same PostgreSQL database as the frontend (`candle_annotator`). If you've already set up the database in the main setup steps, you're all set. The ML service will use the same connection.
#### 4. Initialize DVC
@ -330,15 +358,14 @@ docker compose up --build
This will:
1. Build the Next.js app and ML service Docker images
2. Start PostgreSQL for ML service metadata
2. Start PostgreSQL (shared by frontend and ML service)
3. Start MLflow tracking server
4. Start the ML inference service (FastAPI)
5. Start the Next.js web application
6. Create named volumes for persistent storage:
- `candle-data` - SQLite database for annotations
- `ml-data` - OHLCV data, features, labeled datasets
- `mlflow-data` - MLflow experiments and model artifacts
- `postgres-data` - PostgreSQL data
- `postgres-data` - PostgreSQL data (all application tables)
7. Enable automatic restart unless stopped
Services will be available at:
@ -391,7 +418,7 @@ Edit `.env` to customize:
```
NODE_ENV=production
PORT=3000
DATABASE_PATH=/app/data/candles.db
DATABASE_URL=postgresql://ml_user:ml_password@postgres:5432/candle_annotator
```
Pass environment variables to docker-compose:
@ -402,19 +429,73 @@ docker-compose --env-file .env up -d
### Data Persistence
The application stores the SQLite database in a Docker named volume `candle-data`. This ensures data persists across container restarts:
The application stores all data in PostgreSQL using the Docker named volume `postgres-data`. This ensures data persists across container restarts:
```bash
# View volumes
docker volume ls | grep candle
docker volume ls | grep postgres
# Backup database
docker cp candle-annotator:/app/data/candles.db ./backup.db
docker exec candle_annotator-postgres-1 pg_dump -U ml_user candle_annotator > backup.sql
# Restore database
docker cp ./backup.db candle-annotator:/app/data/candles.db
cat backup.sql | docker exec -i candle_annotator-postgres-1 psql -U ml_user -d candle_annotator
```
### Data Migration from SQLite
If you're upgrading from a SQLite-based version, you need to migrate your data:
1. **Before upgrading**, backup your SQLite database:
```bash
docker cp candle_annotator-candle-annotator-1:/app/data/candles.db ./backup-sqlite.db
```
2. **Stop the old containers**:
```bash
docker compose down
```
3. **Pull the new version** and start services:
```bash
git pull origin master
docker compose up -d
```
4. **Run the migration script** from your host machine:
```bash
# Copy SQLite database to a location accessible to the script
cp backup-sqlite.db data/candles.db
# Run migration (requires ts-node and dependencies)
npm install
DATABASE_URL=postgresql://ml_user:ml_password@localhost:5432/candle_annotator \
npx ts-node scripts/migrate-sqlite-to-postgres.ts
```
**Rollback Procedure** (if migration fails):
1. Stop new containers:
```bash
docker compose down
```
2. Restore SQLite-based docker-compose.yml from git history:
```bash
git checkout HEAD~1 docker-compose.yml Dockerfile
```
3. Restore SQLite database:
```bash
mkdir -p data
cp backup-sqlite.db data/candles.db
```
4. Start old version:
```bash
docker compose up -d
```
### Accessing the Application
Once running, access the application at:
@ -478,12 +559,18 @@ docker-compose down # Stop any existing containers
docker-compose up -d -p 8080:3000/tcp
```
**Database permission errors:**
**Database connection errors:**
```bash
# Ensure volume has correct permissions
docker-compose down
docker volume rm candle-data
docker-compose up --build
# Check PostgreSQL logs
docker compose logs postgres
# Verify database exists
docker exec -it candle_annotator-postgres-1 psql -U ml_user -d candle_annotator -c "\dt"
# Recreate database if needed
docker compose down
docker volume rm candle_annotator_postgres-data
docker compose up --build
```
**Rebuild without cache:**
@ -505,14 +592,24 @@ If you encounter this issue:
1. Rebuild the ml-service: `docker compose build ml-service`
2. Restart services: `docker compose up -d`
**Migration errors during build:**
**Migration errors during startup:**
If you see Drizzle migration errors like "table `annotations` already exists" during the Docker build process, this means the database file from a previous build is being included. This was fixed in commit by skipping migrations during the build phase. The migrations now only run at runtime.
If you see Drizzle migration errors during container startup, check:
If you encounter this issue:
1. Ensure `data/` is in `.dockerignore`
2. Rebuild: `docker compose build --no-cache candle-annotator`
3. Restart: `docker compose up -d`
1. Ensure PostgreSQL is fully started and healthy:
```bash
docker compose ps postgres
```
2. Check migration logs:
```bash
docker compose logs candle-annotator | grep -i migration
```
3. If needed, run migrations manually:
```bash
docker exec -it candle_annotator-candle-annotator-1 npm run db:migrate
```
### Update Procedure
@ -569,7 +666,7 @@ For production deployments, consider:
- This application is designed for **single-user local use** only
- There is no authentication or user management
- The SQLite database is stored locally and not intended for concurrent access
- For production multi-user deployments, consider migrating to PostgreSQL or similar
- PostgreSQL is used for all application data (frontend and ML service)
- The shared database enables the ML service to directly query candle and annotation data
- Docker deployment provides lightweight containerization ideal for standalone instances
- The multi-stage Dockerfile keeps image size minimal (~100MB)

134
README.md
View file

@ -33,7 +33,8 @@ Candle Annotator is a complete machine learning platform for candlestick pattern
- **CSV Upload**: Import OHLC data with support for both Unix timestamps and date strings
- **Replace Mode**: Uploading a new CSV deletes all old candles and replaces them with new data
- **Initial Data**: Docker containers automatically load EURUSD.csv on first startup if database is empty
- **SQLite Storage**: All candle data and annotations stored locally in SQLite database
- **PostgreSQL Storage**: All candle data and annotations stored in PostgreSQL database
- **Shared Database**: Frontend and ML service use the same database for seamless data access
- **Data Persistence**: Annotations and candles persist between sessions
### Chart Visualization
@ -134,21 +135,23 @@ The integrated ML pipeline provides:
│ - /api/predict (proxy) │
│ - /api/model/info (proxy) │
│ └───────────┬─────────────────────────────────────┬─────────── │
│ │ SQLite (annotations) │ HTTP │
│ │ PostgreSQL │ HTTP │
│ ▼ ▼ │
│ ┌──────────────────┐ ┌──────────────────────┐ │
│ │ SQLite Database │ │ ML Inference API │ │
│ │ (Drizzle ORM) │ │ (FastAPI, Python) │ │
│ └──────────────────┘ └──────────┬───────────┘ │
└────────────────────────────────────────────────────┼──────────────┘
┌──────────────────────────────┴───────────────┐
│ │
┌───────────▼─────────┐ ┌──────────────▼────────┐
│ MLflow Server │ │ PostgreSQL │
│ (Experiments, │ │ (Training run │
│ Model Registry) │ │ metadata) │
└─────────────────────┘ └───────────────────────┘
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ PostgreSQL Database (Shared) │ │
│ │ - Frontend tables (candles, annotations, span_annotations) │
│ │ - ML tables (training_runs) │ │
│ │ - Accessed by: Next.js (Drizzle ORM) │ │
│ │ ML Service (SQLAlchemy) │ │
│ └──────────────────────┬──────────────────────────────────┘ │
│ │ │
│ ┌──────────┴──────────┐ │
│ │ │ │
│ ┌───────────▼─────────┐ ┌───────▼───────────────────┐ │
│ │ ML Inference API │ │ MLflow Server │ │
│ │ (FastAPI, Python) │ │ (Experiments, Registry) │ │
│ └─────────────────────┘ └───────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
### ML Pipeline Workflow
@ -189,8 +192,8 @@ The integrated ML pipeline provides:
- **Charting**: lightweight-charts 4.x (TradingView)
- **Icons**: lucide-react
- **Backend**: Next.js API Routes
- **Database**: SQLite with better-sqlite3
- **ORM**: Drizzle ORM
- **Database**: PostgreSQL 16 with pg driver
- **ORM**: Drizzle ORM (PostgreSQL dialect)
- **CSV Parsing**: papaparse
### ML Pipeline (Python)
@ -200,7 +203,8 @@ The integrated ML pipeline provides:
- **Data Processing**: pandas, numpy
- **Experiment Tracking**: MLflow (model registry, artifact storage)
- **Data Versioning**: DVC (Data Version Control)
- **Database**: PostgreSQL 16 (training run metadata)
- **Database**: PostgreSQL 16 (shared with frontend - reads candles/annotations, writes training runs)
- **ORM**: SQLAlchemy (for training runs) + table reflection (for frontend data)
- **Model Persistence**: joblib
- **Validation**: Pydantic
@ -222,8 +226,8 @@ See [DEPLOYMENT.md](./DEPLOYMENT.md#docker-deployment) for detailed Docker instr
- Node.js 18.x or higher (for local development)
- npm 9.x or higher (for local development)
- PostgreSQL 16 or higher (for local development)
- Docker & docker-compose (for containerized deployment)
- Build tools for native modules (see DEPLOYMENT.md)
### Local Development Installation
@ -238,12 +242,26 @@ See [DEPLOYMENT.md](./DEPLOYMENT.md#docker-deployment) for detailed Docker instr
npm install
```
3. Start the development server:
3. Setup PostgreSQL database:
```bash
createdb candle_annotator
createuser -P ml_user
# Enter password: ml_password
psql -c "GRANT ALL PRIVILEGES ON DATABASE candle_annotator TO ml_user;"
```
4. Create `.env` file:
```bash
cp .env.example .env
# Edit .env to set DATABASE_URL=postgresql://ml_user:ml_password@localhost:5432/candle_annotator
```
5. Start the development server:
```bash
npm run dev
```
4. Open http://localhost:3000 in your browser
6. Open http://localhost:3000 in your browser
### Usage
@ -289,28 +307,68 @@ timestamp,label_type,price
## Database Schema
### Candles Table
### Candles Table (PostgreSQL)
```typescript
{
id: integer (PK, auto-increment),
time: integer (Unix timestamp, unique),
open: real,
high: real,
low: real,
close: real
id: serial (PK, auto-increment),
chart_id: integer (FK to charts.id),
time: timestamp (not null, indexed with chart_id),
open: double precision,
high: double precision,
low: double precision,
close: double precision
}
```
### Annotations Table
### Annotations Table (Point Annotations)
```typescript
{
id: integer (PK, auto-increment),
timestamp: integer (Unix timestamp),
label_type: text ('break_up' | 'break_down' | 'line'),
geometry: text (JSON string for line coordinates, null for markers),
created_at: integer (Unix timestamp)
id: serial (PK, auto-increment),
chart_id: integer (FK to charts.id),
timestamp: timestamp (not null),
label_type: text ('line' | 'rectangle'),
geometry: jsonb (for line/rectangle coordinates, nullable),
color: text (default '#3b82f6'),
created_at: timestamp (default now())
}
```
### Span Annotations Table (Pattern Labels)
```typescript
{
id: serial (PK, auto-increment),
chart_id: integer (FK to charts.id),
start_time: timestamp (not null),
end_time: timestamp (not null),
label: text (pattern name, e.g., 'Bullish Engulfing'),
confidence: integer (nullable),
outcome: text (nullable),
notes: text (nullable),
sub_spans: jsonb (nullable),
color: text (default '#2196F3'),
source: text (default 'human'), # 'human' | 'model' | 'hybrid'
model_prediction: jsonb (nullable),
created_at: timestamp (default now())
}
```
### Training Runs Table (ML Service)
```typescript
{
id: serial (PK, auto-increment),
run_id: text (unique, MLflow run ID),
model_type: text (e.g., 'RandomForest', 'XGBoost'),
experiment_name: text,
pipeline_config_hash: text,
dataset_version: text,
metrics_summary: jsonb,
status: text (e.g., 'running', 'completed', 'failed'),
created_at: timestamp (default now()),
completed_at: timestamp (nullable)
}
```
@ -411,9 +469,9 @@ candle_annotator/
### Key Technical Decisions
1. **lightweight-charts v4**: Stable API with good candlestick and marker support
2. **SQLite with better-sqlite3**: Synchronous access, perfect for single-user local apps
2. **PostgreSQL**: Shared database enables ML service to directly query candle/annotation data without CSV exports
3. **SVG Overlay for Lines**: Maintains separate rendering layer from chart, easier coordinate management
4. **Drizzle ORM**: Type-safe queries with minimal overhead
4. **Drizzle ORM**: Type-safe queries with minimal overhead, PostgreSQL dialect for production-grade features
5. **Next.js App Router**: Server-side API routes co-located with frontend code
### Known Limitations
@ -428,9 +486,9 @@ candle_annotator/
See [DEPLOYMENT.md](./DEPLOYMENT.md) for detailed troubleshooting steps.
Common issues:
- **better-sqlite3 binding errors**: Run `npm rebuild better-sqlite3`
- **PostgreSQL connection errors**: Check `DATABASE_URL` environment variable and verify PostgreSQL is running
- **Port 3000 in use**: Use `PORT=3001 npm run dev`
- **Database corruption**: Delete `data/candles.db` and restart
- **Migration errors**: Ensure PostgreSQL is accessible before starting the application
## License

View file

@ -47,7 +47,7 @@
## 8. Testing and Verification
- [x] 8.1 Run the full application locally with PostgreSQL — verify all API routes work
- [ ] 8.2 Verify ML service can query candle/annotation data from shared database
- [ ] 8.3 Run `docker compose up` and verify all services start correctly with new configuration
- [ ] 8.4 Update `DEPLOYMENT.md` with new deployment steps (PostgreSQL migration, data migration script, rollback procedure)
- [ ] 8.5 Update `README.md` and `CLAUDE_DESCRIPTION.md` with database architecture changes
- [x] 8.2 Verify ML service can query candle/annotation data from shared database
- [x] 8.3 Run `docker compose up` and verify all services start correctly with new configuration
- [x] 8.4 Update `DEPLOYMENT.md` with new deployment steps (PostgreSQL migration, data migration script, rollback procedure)
- [x] 8.5 Update `README.md` and `CLAUDE_DESCRIPTION.md` with database architecture changes

View file

@ -78,6 +78,9 @@ class DataAccess:
DataFrame with columns: id, chart_id, time, open, high, low, close
"""
with get_db() as db:
# Convert numpy types to native Python types for SQLAlchemy
chart_id = int(chart_id)
# Build query
stmt = select(self.candles).where(self.candles.c.chart_id == chart_id)
@ -118,6 +121,11 @@ class DataAccess:
DataFrame with span annotations
"""
with get_db() as db:
# Convert numpy types to native Python types for SQLAlchemy
chart_id = int(chart_id)
if min_confidence is not None:
min_confidence = int(min_confidence)
# Build query
stmt = select(self.span_annotations).where(
self.span_annotations.c.chart_id == chart_id
@ -164,6 +172,9 @@ class DataAccess:
DataFrame with point annotations
"""
with get_db() as db:
# Convert numpy types to native Python types for SQLAlchemy
chart_id = int(chart_id)
# Build query
stmt = select(self.annotations).where(
self.annotations.c.chart_id == chart_id

View file

@ -0,0 +1,117 @@
#!/usr/bin/env python3
"""
Test script to verify ML service can access candle and annotation data from PostgreSQL.
"""
import sys
import os
from pathlib import Path
# Add app directory to path
sys.path.insert(0, str(Path(__file__).parent / "app"))
from app.data_access import DataAccess
from app.db import init_db
def test_db_access():
"""Test database access for ML service."""
print("=" * 60)
print("Testing ML Service Database Access")
print("=" * 60)
try:
# Initialize database
print("\n1. Initializing database connection...")
init_db()
print(" ✓ Database connection initialized")
# Create data access instance
print("\n2. Creating data access instance...")
data_access = DataAccess()
print(" ✓ Data access instance created (tables reflected)")
# Get all charts
print("\n3. Querying all charts...")
charts_df = data_access.get_all_charts()
if charts_df.empty:
print(" ⚠ No charts found in database")
return False
print(f" ✓ Found {len(charts_df)} chart(s):")
for _, chart in charts_df.iterrows():
print(f" - {chart['name']} (id={chart['id']})")
# Test with first chart
first_chart = charts_df.iloc[0]
chart_name = first_chart['name']
chart_id = first_chart['id']
print(f"\n4. Testing data access with chart: {chart_name}")
# Get candles
print(f"\n a) Querying candles for chart_id={chart_id}...")
candles_df = data_access.get_candles(chart_id)
if candles_df.empty:
print(" ⚠ No candles found")
else:
print(f" ✓ Found {len(candles_df)} candles")
print(f" - Time range: {candles_df['time'].min()} to {candles_df['time'].max()}")
print(f" - Columns: {list(candles_df.columns)}")
# Get span annotations
print(f"\n b) Querying span annotations for chart_id={chart_id}...")
annotations_df = data_access.get_span_annotations(chart_id)
if annotations_df.empty:
print(" ⚠ No span annotations found")
else:
print(f" ✓ Found {len(annotations_df)} span annotation(s)")
labels = annotations_df['label'].unique()
print(f" - Unique labels: {list(labels)}")
sources = annotations_df['source'].unique()
print(f" - Sources: {list(sources)}")
# Get point annotations
print(f"\n c) Querying point annotations for chart_id={chart_id}...")
point_annotations_df = data_access.get_point_annotations(chart_id)
if point_annotations_df.empty:
print(" ⚠ No point annotations found")
else:
print(f" ✓ Found {len(point_annotations_df)} point annotation(s)")
label_types = point_annotations_df['label_type'].unique()
print(f" - Label types: {list(label_types)}")
# Test get_training_data convenience method
print(f"\n5. Testing get_training_data() convenience method...")
try:
candles_df, annotations_df = data_access.get_training_data(
chart_name=chart_name,
annotation_source="human"
)
print(f" ✓ Successfully retrieved training data")
print(f" - Candles: {len(candles_df)} rows")
print(f" - Annotations: {len(annotations_df)} rows")
except Exception as e:
print(f" ✗ Error: {e}")
return False
print("\n" + "=" * 60)
print("✓ All database access tests passed!")
print("=" * 60)
return True
except Exception as e:
print(f"\n✗ Error during testing: {e}")
import traceback
traceback.print_exc()
return False
if __name__ == "__main__":
success = test_db_access()
sys.exit(0 if success else 1)