fix: resolve numpy type conversion issues in ML service data access
- Convert numpy.int64 to Python int before passing to SQLAlchemy queries - Prevents psycopg2.ProgrammingError: can't adapt type 'numpy.int64' - Applied to get_candles(), get_span_annotations(), and get_point_annotations() - All ML service database access tests now passing successfully
This commit is contained in:
parent
5377431c9d
commit
d1557a3846
6 changed files with 437 additions and 119 deletions
|
|
@ -4,9 +4,28 @@
|
|||
|
||||
Candle Annotator is a complete machine learning platform for candlestick pattern recognition, combining manual annotation tools with an integrated Python ML pipeline. It's designed for traders and ML researchers to create labeled training data, train models, and deploy pattern recognition in an active learning loop.
|
||||
|
||||
**Current Version**: 3.0.0 (ML Pipeline Integration - Feature engineering, training, and inference service)
|
||||
**Current Version**: 3.1.0 (Database Consolidation - PostgreSQL for all application data)
|
||||
|
||||
## Recent Changes (v3.0.0)
|
||||
## Recent Changes (v3.1.0)
|
||||
|
||||
### Database Consolidation to PostgreSQL
|
||||
- **Unified Database**: Migrated from SQLite to PostgreSQL for all application data (frontend + ML service)
|
||||
- **Shared Data Access**: ML service can now directly query candle and annotation tables from PostgreSQL
|
||||
- **No More CSV Exports**: ML service reads training data directly from database, eliminating export/import steps
|
||||
- **Schema Updates**: All tables converted to PostgreSQL types (serial, timestamp, double precision, jsonb)
|
||||
- **Type Safety**: Fixed numpy type conversion issues in ML service data access layer
|
||||
- **Migration Script**: One-time migration script (`scripts/migrate-sqlite-to-postgres.ts`) for upgrading from SQLite
|
||||
- **Rollback Support**: Documented rollback procedure if migration fails
|
||||
|
||||
### Backend Infrastructure Updates
|
||||
- **Drizzle ORM**: Updated to PostgreSQL dialect with pg driver
|
||||
- **Database Connection**: Connection pooling (max: 10 connections) for Next.js API routes
|
||||
- **ML Service ORM**: SQLAlchemy table reflections for read-only access to frontend tables
|
||||
- **Environment Variables**: `DATABASE_URL` replaces `DATABASE_PATH`
|
||||
- **Docker Compose**: Updated with shared PostgreSQL instance, removed `candle-data` volume
|
||||
- **Health Checks**: Enhanced health endpoints verify PostgreSQL connectivity
|
||||
|
||||
## Previous Changes (v3.0.0)
|
||||
|
||||
### ML Pipeline Integration
|
||||
- **Python ML Service**: FastAPI-based inference service for real-time pattern prediction
|
||||
|
|
@ -72,8 +91,8 @@ Candle Annotator is a complete machine learning platform for candlestick pattern
|
|||
### Backend (Next.js)
|
||||
- **Runtime**: Node.js 18.x
|
||||
- **API**: Next.js API Routes
|
||||
- **Database**: SQLite with better-sqlite3
|
||||
- **ORM**: Drizzle ORM
|
||||
- **Database**: PostgreSQL 16 with pg driver
|
||||
- **ORM**: Drizzle ORM (PostgreSQL dialect)
|
||||
- **CSV**: papaparse
|
||||
|
||||
### ML Service (Python)
|
||||
|
|
@ -83,7 +102,8 @@ Candle Annotator is a complete machine learning platform for candlestick pattern
|
|||
- **Data Processing**: pandas, numpy
|
||||
- **Experiment Tracking**: MLflow (model registry, artifact storage)
|
||||
- **Data Versioning**: DVC
|
||||
- **Database**: PostgreSQL 16 (training run metadata)
|
||||
- **Database**: PostgreSQL 16 (shared with frontend - reads candles/annotations, writes training runs)
|
||||
- **ORM**: SQLAlchemy (for training runs) + table reflection (for frontend data)
|
||||
- **Model Persistence**: joblib
|
||||
- **Validation**: Pydantic
|
||||
|
||||
|
|
@ -103,8 +123,9 @@ Candle Annotator is a complete machine learning platform for candlestick pattern
|
|||
|
||||
### Data Management
|
||||
- **CSV Upload**: Import OHLC data (time, open, high, low, close)
|
||||
- **Persistent Storage**: SQLite database stores all annotations
|
||||
- **CSV Export**: Download labeled data for ML training
|
||||
- **Persistent Storage**: PostgreSQL database stores all annotations and candles
|
||||
- **Shared Database**: ML service directly queries candle/annotation data
|
||||
- **CSV Export**: Download labeled data (optional, no longer required for ML training)
|
||||
|
||||
### UI Components
|
||||
- **Toolbox**: Left sidebar with tool buttons, color picker, label management
|
||||
|
|
@ -274,7 +295,7 @@ candle_annotator/
|
|||
|
||||
- **Single User**: No authentication, local data only
|
||||
- **No Undo**: Annotations can only be deleted, not undone
|
||||
- **SQLite Limits**: Not for concurrent multi-user access
|
||||
- **PostgreSQL Required**: Application requires PostgreSQL server to be running
|
||||
- **Memory**: Large CSV files (100k+ rows) slow performance
|
||||
- **Lines**: Free-form drawing, no snap-to-candle
|
||||
|
||||
|
|
@ -322,14 +343,28 @@ Before marking features complete:
|
|||
|
||||
## Version History
|
||||
|
||||
### v2.0.0 (Current)
|
||||
### v3.1.0 (Current)
|
||||
- Database consolidation to PostgreSQL
|
||||
- Shared database between frontend and ML service
|
||||
- Direct database access for ML training (no CSV exports)
|
||||
- Migration script for SQLite to PostgreSQL
|
||||
- Type safety improvements in ML service
|
||||
|
||||
### v3.0.0
|
||||
- ML pipeline integration with FastAPI inference service
|
||||
- Feature engineering with TA-Lib
|
||||
- Model training with MLflow tracking
|
||||
- Prediction UI with disagreement detection
|
||||
- Active learning loop
|
||||
|
||||
### v2.0.0
|
||||
- Label management sidebar with search/filter
|
||||
- Hacker theme with matrix colors and monospace font
|
||||
- Docker deployment with compose
|
||||
- Keyboard delete for labels
|
||||
- Label selection on chart
|
||||
|
||||
### v1.0.0 (Previous)
|
||||
### v1.0.0
|
||||
- Basic annotation tools (Break Up, Break Down, Line, Delete)
|
||||
- Candlestick chart visualization
|
||||
- CSV import/export
|
||||
|
|
|
|||
231
DEPLOYMENT.md
231
DEPLOYMENT.md
|
|
@ -4,7 +4,7 @@
|
|||
|
||||
- Node.js 18.x or higher
|
||||
- npm 9.x or higher
|
||||
- Python and build tools (for native module compilation)
|
||||
- PostgreSQL 16 or higher
|
||||
|
||||
## Local Development Setup
|
||||
|
||||
|
|
@ -14,32 +14,37 @@
|
|||
npm install
|
||||
```
|
||||
|
||||
Note: The `better-sqlite3` package requires native compilation. If you encounter build errors, ensure you have the necessary build tools:
|
||||
|
||||
**Linux:**
|
||||
```bash
|
||||
sudo apt-get install build-essential python3
|
||||
```
|
||||
|
||||
**macOS:**
|
||||
```bash
|
||||
xcode-select --install
|
||||
```
|
||||
|
||||
**Windows:**
|
||||
```bash
|
||||
npm install --global windows-build-tools
|
||||
```
|
||||
|
||||
### 2. Database Setup
|
||||
|
||||
The SQLite database will be automatically created when you start the application. The database file is located at:
|
||||
#### PostgreSQL Setup
|
||||
|
||||
```
|
||||
./data/candles.db
|
||||
The application uses PostgreSQL for all data storage. Set up the database:
|
||||
|
||||
```bash
|
||||
# Create database
|
||||
createdb candle_annotator
|
||||
|
||||
# Create user (if needed)
|
||||
createuser -P ml_user
|
||||
# Enter password: ml_password
|
||||
|
||||
# Grant privileges
|
||||
psql -c "GRANT ALL PRIVILEGES ON DATABASE candle_annotator TO ml_user;"
|
||||
```
|
||||
|
||||
To run migrations manually:
|
||||
#### Environment Configuration
|
||||
|
||||
Create a `.env` file in the project root:
|
||||
|
||||
```env
|
||||
DATABASE_URL=postgresql://ml_user:ml_password@localhost:5432/candle_annotator
|
||||
NODE_ENV=development
|
||||
PORT=3000
|
||||
```
|
||||
|
||||
#### Run Migrations
|
||||
|
||||
Database migrations run automatically on application startup. To run manually:
|
||||
|
||||
```bash
|
||||
npx drizzle-kit generate
|
||||
|
|
@ -77,14 +82,30 @@ time,open,high,low,close
|
|||
- Unix timestamp (seconds): `1700000000`
|
||||
- Date string: `2024-01-15`
|
||||
|
||||
### 4. Migrating from SQLite (if applicable)
|
||||
|
||||
If you have existing data in an SQLite database from a previous version, use the migration script:
|
||||
|
||||
```bash
|
||||
# Run the migration script
|
||||
npm run migrate:sqlite-to-postgres
|
||||
|
||||
# Or with TypeScript directly
|
||||
npx ts-node scripts/migrate-sqlite-to-postgres.ts
|
||||
```
|
||||
|
||||
This script will:
|
||||
- Read all data from the SQLite database (`data/candles.db`)
|
||||
- Convert data types (timestamps, booleans, JSON→jsonb)
|
||||
- Insert data into PostgreSQL
|
||||
- Skip if run multiple times (idempotent)
|
||||
|
||||
## Building for Production
|
||||
|
||||
```bash
|
||||
npm run build
|
||||
```
|
||||
|
||||
Note: Production builds with `better-sqlite3` require the native module to be compiled for the target platform.
|
||||
|
||||
## Running Production Build
|
||||
|
||||
```bash
|
||||
|
|
@ -96,33 +117,37 @@ The production server will run on port 3000 by default.
|
|||
|
||||
## Troubleshooting
|
||||
|
||||
### better-sqlite3 Build Issues
|
||||
### Database Connection Issues
|
||||
|
||||
If you encounter errors related to `better-sqlite3` not finding bindings:
|
||||
If the application fails to connect to PostgreSQL:
|
||||
|
||||
1. Rebuild the module:
|
||||
1. Verify PostgreSQL is running:
|
||||
```bash
|
||||
npm rebuild better-sqlite3
|
||||
pg_isready -h localhost -p 5432
|
||||
```
|
||||
|
||||
2. If that fails, reinstall:
|
||||
2. Check DATABASE_URL environment variable:
|
||||
```bash
|
||||
npm uninstall better-sqlite3
|
||||
npm install better-sqlite3
|
||||
echo $DATABASE_URL
|
||||
```
|
||||
|
||||
3. For development, you can use `npm run dev` which handles the module better than production builds.
|
||||
3. Verify credentials:
|
||||
```bash
|
||||
psql -U ml_user -d candle_annotator
|
||||
```
|
||||
|
||||
### Database Issues
|
||||
|
||||
If the database becomes corrupted or you want to start fresh:
|
||||
If you want to reset the database:
|
||||
|
||||
1. Stop the application
|
||||
2. Delete the database file:
|
||||
2. Drop and recreate the database:
|
||||
```bash
|
||||
rm -f data/candles.db
|
||||
dropdb candle_annotator
|
||||
createdb candle_annotator
|
||||
psql -c "GRANT ALL PRIVILEGES ON DATABASE candle_annotator TO ml_user;"
|
||||
```
|
||||
3. Restart the application (it will recreate the database)
|
||||
3. Restart the application (migrations will run automatically)
|
||||
|
||||
### Port Already in Use
|
||||
|
||||
|
|
@ -134,7 +159,18 @@ PORT=3001 npm run dev
|
|||
|
||||
## Environment Variables
|
||||
|
||||
The application doesn't require any environment variables for local development. All configuration is hardcoded for simplicity.
|
||||
Required environment variables:
|
||||
|
||||
- `DATABASE_URL` - PostgreSQL connection string (e.g., `postgresql://ml_user:ml_password@localhost:5432/candle_annotator`)
|
||||
- `NODE_ENV` - Environment (`development` or `production`)
|
||||
- `PORT` - Server port (default: 3000)
|
||||
|
||||
Optional variables for ML inference:
|
||||
|
||||
- `INFERENCE_API_URL` - ML service endpoint (default: `http://localhost:8001`)
|
||||
- `INFERENCE_API_TIMEOUT` - Request timeout in ms (default: 30000)
|
||||
- `INFERENCE_BATCH_TIMEOUT` - Batch processing timeout in ms (default: 120000)
|
||||
- `NEXT_PUBLIC_PREDICTIONS_ENABLED` - Enable predictions UI (default: true)
|
||||
|
||||
## File Structure
|
||||
|
||||
|
|
@ -202,15 +238,7 @@ uv sync
|
|||
|
||||
#### 3. Setup PostgreSQL
|
||||
|
||||
The ML service requires PostgreSQL for storing training run metadata:
|
||||
|
||||
```bash
|
||||
# Create database
|
||||
createdb ml_db
|
||||
|
||||
# Or using psql
|
||||
psql -c "CREATE DATABASE ml_db;"
|
||||
```
|
||||
The ML service shares the same PostgreSQL database as the frontend (`candle_annotator`). If you've already set up the database in the main setup steps, you're all set. The ML service will use the same connection.
|
||||
|
||||
#### 4. Initialize DVC
|
||||
|
||||
|
|
@ -330,15 +358,14 @@ docker compose up --build
|
|||
|
||||
This will:
|
||||
1. Build the Next.js app and ML service Docker images
|
||||
2. Start PostgreSQL for ML service metadata
|
||||
2. Start PostgreSQL (shared by frontend and ML service)
|
||||
3. Start MLflow tracking server
|
||||
4. Start the ML inference service (FastAPI)
|
||||
5. Start the Next.js web application
|
||||
6. Create named volumes for persistent storage:
|
||||
- `candle-data` - SQLite database for annotations
|
||||
- `ml-data` - OHLCV data, features, labeled datasets
|
||||
- `mlflow-data` - MLflow experiments and model artifacts
|
||||
- `postgres-data` - PostgreSQL data
|
||||
- `postgres-data` - PostgreSQL data (all application tables)
|
||||
7. Enable automatic restart unless stopped
|
||||
|
||||
Services will be available at:
|
||||
|
|
@ -391,7 +418,7 @@ Edit `.env` to customize:
|
|||
```
|
||||
NODE_ENV=production
|
||||
PORT=3000
|
||||
DATABASE_PATH=/app/data/candles.db
|
||||
DATABASE_URL=postgresql://ml_user:ml_password@postgres:5432/candle_annotator
|
||||
```
|
||||
|
||||
Pass environment variables to docker-compose:
|
||||
|
|
@ -402,19 +429,73 @@ docker-compose --env-file .env up -d
|
|||
|
||||
### Data Persistence
|
||||
|
||||
The application stores the SQLite database in a Docker named volume `candle-data`. This ensures data persists across container restarts:
|
||||
The application stores all data in PostgreSQL using the Docker named volume `postgres-data`. This ensures data persists across container restarts:
|
||||
|
||||
```bash
|
||||
# View volumes
|
||||
docker volume ls | grep candle
|
||||
docker volume ls | grep postgres
|
||||
|
||||
# Backup database
|
||||
docker cp candle-annotator:/app/data/candles.db ./backup.db
|
||||
docker exec candle_annotator-postgres-1 pg_dump -U ml_user candle_annotator > backup.sql
|
||||
|
||||
# Restore database
|
||||
docker cp ./backup.db candle-annotator:/app/data/candles.db
|
||||
cat backup.sql | docker exec -i candle_annotator-postgres-1 psql -U ml_user -d candle_annotator
|
||||
```
|
||||
|
||||
### Data Migration from SQLite
|
||||
|
||||
If you're upgrading from a SQLite-based version, you need to migrate your data:
|
||||
|
||||
1. **Before upgrading**, backup your SQLite database:
|
||||
```bash
|
||||
docker cp candle_annotator-candle-annotator-1:/app/data/candles.db ./backup-sqlite.db
|
||||
```
|
||||
|
||||
2. **Stop the old containers**:
|
||||
```bash
|
||||
docker compose down
|
||||
```
|
||||
|
||||
3. **Pull the new version** and start services:
|
||||
```bash
|
||||
git pull origin master
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
4. **Run the migration script** from your host machine:
|
||||
```bash
|
||||
# Copy SQLite database to a location accessible to the script
|
||||
cp backup-sqlite.db data/candles.db
|
||||
|
||||
# Run migration (requires ts-node and dependencies)
|
||||
npm install
|
||||
DATABASE_URL=postgresql://ml_user:ml_password@localhost:5432/candle_annotator \
|
||||
npx ts-node scripts/migrate-sqlite-to-postgres.ts
|
||||
```
|
||||
|
||||
**Rollback Procedure** (if migration fails):
|
||||
|
||||
1. Stop new containers:
|
||||
```bash
|
||||
docker compose down
|
||||
```
|
||||
|
||||
2. Restore SQLite-based docker-compose.yml from git history:
|
||||
```bash
|
||||
git checkout HEAD~1 docker-compose.yml Dockerfile
|
||||
```
|
||||
|
||||
3. Restore SQLite database:
|
||||
```bash
|
||||
mkdir -p data
|
||||
cp backup-sqlite.db data/candles.db
|
||||
```
|
||||
|
||||
4. Start old version:
|
||||
```bash
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
### Accessing the Application
|
||||
|
||||
Once running, access the application at:
|
||||
|
|
@ -478,12 +559,18 @@ docker-compose down # Stop any existing containers
|
|||
docker-compose up -d -p 8080:3000/tcp
|
||||
```
|
||||
|
||||
**Database permission errors:**
|
||||
**Database connection errors:**
|
||||
```bash
|
||||
# Ensure volume has correct permissions
|
||||
docker-compose down
|
||||
docker volume rm candle-data
|
||||
docker-compose up --build
|
||||
# Check PostgreSQL logs
|
||||
docker compose logs postgres
|
||||
|
||||
# Verify database exists
|
||||
docker exec -it candle_annotator-postgres-1 psql -U ml_user -d candle_annotator -c "\dt"
|
||||
|
||||
# Recreate database if needed
|
||||
docker compose down
|
||||
docker volume rm candle_annotator_postgres-data
|
||||
docker compose up --build
|
||||
```
|
||||
|
||||
**Rebuild without cache:**
|
||||
|
|
@ -505,14 +592,24 @@ If you encounter this issue:
|
|||
1. Rebuild the ml-service: `docker compose build ml-service`
|
||||
2. Restart services: `docker compose up -d`
|
||||
|
||||
**Migration errors during build:**
|
||||
**Migration errors during startup:**
|
||||
|
||||
If you see Drizzle migration errors like "table `annotations` already exists" during the Docker build process, this means the database file from a previous build is being included. This was fixed in commit by skipping migrations during the build phase. The migrations now only run at runtime.
|
||||
If you see Drizzle migration errors during container startup, check:
|
||||
|
||||
If you encounter this issue:
|
||||
1. Ensure `data/` is in `.dockerignore`
|
||||
2. Rebuild: `docker compose build --no-cache candle-annotator`
|
||||
3. Restart: `docker compose up -d`
|
||||
1. Ensure PostgreSQL is fully started and healthy:
|
||||
```bash
|
||||
docker compose ps postgres
|
||||
```
|
||||
|
||||
2. Check migration logs:
|
||||
```bash
|
||||
docker compose logs candle-annotator | grep -i migration
|
||||
```
|
||||
|
||||
3. If needed, run migrations manually:
|
||||
```bash
|
||||
docker exec -it candle_annotator-candle-annotator-1 npm run db:migrate
|
||||
```
|
||||
|
||||
### Update Procedure
|
||||
|
||||
|
|
@ -569,7 +666,7 @@ For production deployments, consider:
|
|||
|
||||
- This application is designed for **single-user local use** only
|
||||
- There is no authentication or user management
|
||||
- The SQLite database is stored locally and not intended for concurrent access
|
||||
- For production multi-user deployments, consider migrating to PostgreSQL or similar
|
||||
- PostgreSQL is used for all application data (frontend and ML service)
|
||||
- The shared database enables the ML service to directly query candle and annotation data
|
||||
- Docker deployment provides lightweight containerization ideal for standalone instances
|
||||
- The multi-stage Dockerfile keeps image size minimal (~100MB)
|
||||
|
|
|
|||
134
README.md
134
README.md
|
|
@ -33,7 +33,8 @@ Candle Annotator is a complete machine learning platform for candlestick pattern
|
|||
- **CSV Upload**: Import OHLC data with support for both Unix timestamps and date strings
|
||||
- **Replace Mode**: Uploading a new CSV deletes all old candles and replaces them with new data
|
||||
- **Initial Data**: Docker containers automatically load EURUSD.csv on first startup if database is empty
|
||||
- **SQLite Storage**: All candle data and annotations stored locally in SQLite database
|
||||
- **PostgreSQL Storage**: All candle data and annotations stored in PostgreSQL database
|
||||
- **Shared Database**: Frontend and ML service use the same database for seamless data access
|
||||
- **Data Persistence**: Annotations and candles persist between sessions
|
||||
|
||||
### Chart Visualization
|
||||
|
|
@ -134,21 +135,23 @@ The integrated ML pipeline provides:
|
|||
│ - /api/predict (proxy) │
|
||||
│ - /api/model/info (proxy) │
|
||||
│ └───────────┬─────────────────────────────────────┬─────────── │
|
||||
│ │ SQLite (annotations) │ HTTP │
|
||||
│ │ PostgreSQL │ HTTP │
|
||||
│ ▼ ▼ │
|
||||
│ ┌──────────────────┐ ┌──────────────────────┐ │
|
||||
│ │ SQLite Database │ │ ML Inference API │ │
|
||||
│ │ (Drizzle ORM) │ │ (FastAPI, Python) │ │
|
||||
│ └──────────────────┘ └──────────┬───────────┘ │
|
||||
└────────────────────────────────────────────────────┼──────────────┘
|
||||
│
|
||||
┌──────────────────────────────┴───────────────┐
|
||||
│ │
|
||||
┌───────────▼─────────┐ ┌──────────────▼────────┐
|
||||
│ MLflow Server │ │ PostgreSQL │
|
||||
│ (Experiments, │ │ (Training run │
|
||||
│ Model Registry) │ │ metadata) │
|
||||
└─────────────────────┘ └───────────────────────┘
|
||||
│ ┌──────────────────────────────────────────────────────────┐ │
|
||||
│ │ PostgreSQL Database (Shared) │ │
|
||||
│ │ - Frontend tables (candles, annotations, span_annotations) │
|
||||
│ │ - ML tables (training_runs) │ │
|
||||
│ │ - Accessed by: Next.js (Drizzle ORM) │ │
|
||||
│ │ ML Service (SQLAlchemy) │ │
|
||||
│ └──────────────────────┬──────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────┴──────────┐ │
|
||||
│ │ │ │
|
||||
│ ┌───────────▼─────────┐ ┌───────▼───────────────────┐ │
|
||||
│ │ ML Inference API │ │ MLflow Server │ │
|
||||
│ │ (FastAPI, Python) │ │ (Experiments, Registry) │ │
|
||||
│ └─────────────────────┘ └───────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### ML Pipeline Workflow
|
||||
|
|
@ -189,8 +192,8 @@ The integrated ML pipeline provides:
|
|||
- **Charting**: lightweight-charts 4.x (TradingView)
|
||||
- **Icons**: lucide-react
|
||||
- **Backend**: Next.js API Routes
|
||||
- **Database**: SQLite with better-sqlite3
|
||||
- **ORM**: Drizzle ORM
|
||||
- **Database**: PostgreSQL 16 with pg driver
|
||||
- **ORM**: Drizzle ORM (PostgreSQL dialect)
|
||||
- **CSV Parsing**: papaparse
|
||||
|
||||
### ML Pipeline (Python)
|
||||
|
|
@ -200,7 +203,8 @@ The integrated ML pipeline provides:
|
|||
- **Data Processing**: pandas, numpy
|
||||
- **Experiment Tracking**: MLflow (model registry, artifact storage)
|
||||
- **Data Versioning**: DVC (Data Version Control)
|
||||
- **Database**: PostgreSQL 16 (training run metadata)
|
||||
- **Database**: PostgreSQL 16 (shared with frontend - reads candles/annotations, writes training runs)
|
||||
- **ORM**: SQLAlchemy (for training runs) + table reflection (for frontend data)
|
||||
- **Model Persistence**: joblib
|
||||
- **Validation**: Pydantic
|
||||
|
||||
|
|
@ -222,8 +226,8 @@ See [DEPLOYMENT.md](./DEPLOYMENT.md#docker-deployment) for detailed Docker instr
|
|||
|
||||
- Node.js 18.x or higher (for local development)
|
||||
- npm 9.x or higher (for local development)
|
||||
- PostgreSQL 16 or higher (for local development)
|
||||
- Docker & docker-compose (for containerized deployment)
|
||||
- Build tools for native modules (see DEPLOYMENT.md)
|
||||
|
||||
### Local Development Installation
|
||||
|
||||
|
|
@ -238,12 +242,26 @@ See [DEPLOYMENT.md](./DEPLOYMENT.md#docker-deployment) for detailed Docker instr
|
|||
npm install
|
||||
```
|
||||
|
||||
3. Start the development server:
|
||||
3. Setup PostgreSQL database:
|
||||
```bash
|
||||
createdb candle_annotator
|
||||
createuser -P ml_user
|
||||
# Enter password: ml_password
|
||||
psql -c "GRANT ALL PRIVILEGES ON DATABASE candle_annotator TO ml_user;"
|
||||
```
|
||||
|
||||
4. Create `.env` file:
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Edit .env to set DATABASE_URL=postgresql://ml_user:ml_password@localhost:5432/candle_annotator
|
||||
```
|
||||
|
||||
5. Start the development server:
|
||||
```bash
|
||||
npm run dev
|
||||
```
|
||||
|
||||
4. Open http://localhost:3000 in your browser
|
||||
6. Open http://localhost:3000 in your browser
|
||||
|
||||
### Usage
|
||||
|
||||
|
|
@ -289,28 +307,68 @@ timestamp,label_type,price
|
|||
|
||||
## Database Schema
|
||||
|
||||
### Candles Table
|
||||
### Candles Table (PostgreSQL)
|
||||
|
||||
```typescript
|
||||
{
|
||||
id: integer (PK, auto-increment),
|
||||
time: integer (Unix timestamp, unique),
|
||||
open: real,
|
||||
high: real,
|
||||
low: real,
|
||||
close: real
|
||||
id: serial (PK, auto-increment),
|
||||
chart_id: integer (FK to charts.id),
|
||||
time: timestamp (not null, indexed with chart_id),
|
||||
open: double precision,
|
||||
high: double precision,
|
||||
low: double precision,
|
||||
close: double precision
|
||||
}
|
||||
```
|
||||
|
||||
### Annotations Table
|
||||
### Annotations Table (Point Annotations)
|
||||
|
||||
```typescript
|
||||
{
|
||||
id: integer (PK, auto-increment),
|
||||
timestamp: integer (Unix timestamp),
|
||||
label_type: text ('break_up' | 'break_down' | 'line'),
|
||||
geometry: text (JSON string for line coordinates, null for markers),
|
||||
created_at: integer (Unix timestamp)
|
||||
id: serial (PK, auto-increment),
|
||||
chart_id: integer (FK to charts.id),
|
||||
timestamp: timestamp (not null),
|
||||
label_type: text ('line' | 'rectangle'),
|
||||
geometry: jsonb (for line/rectangle coordinates, nullable),
|
||||
color: text (default '#3b82f6'),
|
||||
created_at: timestamp (default now())
|
||||
}
|
||||
```
|
||||
|
||||
### Span Annotations Table (Pattern Labels)
|
||||
|
||||
```typescript
|
||||
{
|
||||
id: serial (PK, auto-increment),
|
||||
chart_id: integer (FK to charts.id),
|
||||
start_time: timestamp (not null),
|
||||
end_time: timestamp (not null),
|
||||
label: text (pattern name, e.g., 'Bullish Engulfing'),
|
||||
confidence: integer (nullable),
|
||||
outcome: text (nullable),
|
||||
notes: text (nullable),
|
||||
sub_spans: jsonb (nullable),
|
||||
color: text (default '#2196F3'),
|
||||
source: text (default 'human'), # 'human' | 'model' | 'hybrid'
|
||||
model_prediction: jsonb (nullable),
|
||||
created_at: timestamp (default now())
|
||||
}
|
||||
```
|
||||
|
||||
### Training Runs Table (ML Service)
|
||||
|
||||
```typescript
|
||||
{
|
||||
id: serial (PK, auto-increment),
|
||||
run_id: text (unique, MLflow run ID),
|
||||
model_type: text (e.g., 'RandomForest', 'XGBoost'),
|
||||
experiment_name: text,
|
||||
pipeline_config_hash: text,
|
||||
dataset_version: text,
|
||||
metrics_summary: jsonb,
|
||||
status: text (e.g., 'running', 'completed', 'failed'),
|
||||
created_at: timestamp (default now()),
|
||||
completed_at: timestamp (nullable)
|
||||
}
|
||||
```
|
||||
|
||||
|
|
@ -411,9 +469,9 @@ candle_annotator/
|
|||
### Key Technical Decisions
|
||||
|
||||
1. **lightweight-charts v4**: Stable API with good candlestick and marker support
|
||||
2. **SQLite with better-sqlite3**: Synchronous access, perfect for single-user local apps
|
||||
2. **PostgreSQL**: Shared database enables ML service to directly query candle/annotation data without CSV exports
|
||||
3. **SVG Overlay for Lines**: Maintains separate rendering layer from chart, easier coordinate management
|
||||
4. **Drizzle ORM**: Type-safe queries with minimal overhead
|
||||
4. **Drizzle ORM**: Type-safe queries with minimal overhead, PostgreSQL dialect for production-grade features
|
||||
5. **Next.js App Router**: Server-side API routes co-located with frontend code
|
||||
|
||||
### Known Limitations
|
||||
|
|
@ -428,9 +486,9 @@ candle_annotator/
|
|||
See [DEPLOYMENT.md](./DEPLOYMENT.md) for detailed troubleshooting steps.
|
||||
|
||||
Common issues:
|
||||
- **better-sqlite3 binding errors**: Run `npm rebuild better-sqlite3`
|
||||
- **PostgreSQL connection errors**: Check `DATABASE_URL` environment variable and verify PostgreSQL is running
|
||||
- **Port 3000 in use**: Use `PORT=3001 npm run dev`
|
||||
- **Database corruption**: Delete `data/candles.db` and restart
|
||||
- **Migration errors**: Ensure PostgreSQL is accessible before starting the application
|
||||
|
||||
## License
|
||||
|
||||
|
|
|
|||
|
|
@ -47,7 +47,7 @@
|
|||
## 8. Testing and Verification
|
||||
|
||||
- [x] 8.1 Run the full application locally with PostgreSQL — verify all API routes work
|
||||
- [ ] 8.2 Verify ML service can query candle/annotation data from shared database
|
||||
- [ ] 8.3 Run `docker compose up` and verify all services start correctly with new configuration
|
||||
- [ ] 8.4 Update `DEPLOYMENT.md` with new deployment steps (PostgreSQL migration, data migration script, rollback procedure)
|
||||
- [ ] 8.5 Update `README.md` and `CLAUDE_DESCRIPTION.md` with database architecture changes
|
||||
- [x] 8.2 Verify ML service can query candle/annotation data from shared database
|
||||
- [x] 8.3 Run `docker compose up` and verify all services start correctly with new configuration
|
||||
- [x] 8.4 Update `DEPLOYMENT.md` with new deployment steps (PostgreSQL migration, data migration script, rollback procedure)
|
||||
- [x] 8.5 Update `README.md` and `CLAUDE_DESCRIPTION.md` with database architecture changes
|
||||
|
|
|
|||
|
|
@ -78,6 +78,9 @@ class DataAccess:
|
|||
DataFrame with columns: id, chart_id, time, open, high, low, close
|
||||
"""
|
||||
with get_db() as db:
|
||||
# Convert numpy types to native Python types for SQLAlchemy
|
||||
chart_id = int(chart_id)
|
||||
|
||||
# Build query
|
||||
stmt = select(self.candles).where(self.candles.c.chart_id == chart_id)
|
||||
|
||||
|
|
@ -118,6 +121,11 @@ class DataAccess:
|
|||
DataFrame with span annotations
|
||||
"""
|
||||
with get_db() as db:
|
||||
# Convert numpy types to native Python types for SQLAlchemy
|
||||
chart_id = int(chart_id)
|
||||
if min_confidence is not None:
|
||||
min_confidence = int(min_confidence)
|
||||
|
||||
# Build query
|
||||
stmt = select(self.span_annotations).where(
|
||||
self.span_annotations.c.chart_id == chart_id
|
||||
|
|
@ -164,6 +172,9 @@ class DataAccess:
|
|||
DataFrame with point annotations
|
||||
"""
|
||||
with get_db() as db:
|
||||
# Convert numpy types to native Python types for SQLAlchemy
|
||||
chart_id = int(chart_id)
|
||||
|
||||
# Build query
|
||||
stmt = select(self.annotations).where(
|
||||
self.annotations.c.chart_id == chart_id
|
||||
|
|
|
|||
117
services/ml/test_db_access.py
Normal file
117
services/ml/test_db_access.py
Normal file
|
|
@ -0,0 +1,117 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Test script to verify ML service can access candle and annotation data from PostgreSQL.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
# Add app directory to path
|
||||
sys.path.insert(0, str(Path(__file__).parent / "app"))
|
||||
|
||||
from app.data_access import DataAccess
|
||||
from app.db import init_db
|
||||
|
||||
|
||||
def test_db_access():
|
||||
"""Test database access for ML service."""
|
||||
print("=" * 60)
|
||||
print("Testing ML Service Database Access")
|
||||
print("=" * 60)
|
||||
|
||||
try:
|
||||
# Initialize database
|
||||
print("\n1. Initializing database connection...")
|
||||
init_db()
|
||||
print(" ✓ Database connection initialized")
|
||||
|
||||
# Create data access instance
|
||||
print("\n2. Creating data access instance...")
|
||||
data_access = DataAccess()
|
||||
print(" ✓ Data access instance created (tables reflected)")
|
||||
|
||||
# Get all charts
|
||||
print("\n3. Querying all charts...")
|
||||
charts_df = data_access.get_all_charts()
|
||||
|
||||
if charts_df.empty:
|
||||
print(" ⚠ No charts found in database")
|
||||
return False
|
||||
|
||||
print(f" ✓ Found {len(charts_df)} chart(s):")
|
||||
for _, chart in charts_df.iterrows():
|
||||
print(f" - {chart['name']} (id={chart['id']})")
|
||||
|
||||
# Test with first chart
|
||||
first_chart = charts_df.iloc[0]
|
||||
chart_name = first_chart['name']
|
||||
chart_id = first_chart['id']
|
||||
|
||||
print(f"\n4. Testing data access with chart: {chart_name}")
|
||||
|
||||
# Get candles
|
||||
print(f"\n a) Querying candles for chart_id={chart_id}...")
|
||||
candles_df = data_access.get_candles(chart_id)
|
||||
|
||||
if candles_df.empty:
|
||||
print(" ⚠ No candles found")
|
||||
else:
|
||||
print(f" ✓ Found {len(candles_df)} candles")
|
||||
print(f" - Time range: {candles_df['time'].min()} to {candles_df['time'].max()}")
|
||||
print(f" - Columns: {list(candles_df.columns)}")
|
||||
|
||||
# Get span annotations
|
||||
print(f"\n b) Querying span annotations for chart_id={chart_id}...")
|
||||
annotations_df = data_access.get_span_annotations(chart_id)
|
||||
|
||||
if annotations_df.empty:
|
||||
print(" ⚠ No span annotations found")
|
||||
else:
|
||||
print(f" ✓ Found {len(annotations_df)} span annotation(s)")
|
||||
labels = annotations_df['label'].unique()
|
||||
print(f" - Unique labels: {list(labels)}")
|
||||
sources = annotations_df['source'].unique()
|
||||
print(f" - Sources: {list(sources)}")
|
||||
|
||||
# Get point annotations
|
||||
print(f"\n c) Querying point annotations for chart_id={chart_id}...")
|
||||
point_annotations_df = data_access.get_point_annotations(chart_id)
|
||||
|
||||
if point_annotations_df.empty:
|
||||
print(" ⚠ No point annotations found")
|
||||
else:
|
||||
print(f" ✓ Found {len(point_annotations_df)} point annotation(s)")
|
||||
label_types = point_annotations_df['label_type'].unique()
|
||||
print(f" - Label types: {list(label_types)}")
|
||||
|
||||
# Test get_training_data convenience method
|
||||
print(f"\n5. Testing get_training_data() convenience method...")
|
||||
try:
|
||||
candles_df, annotations_df = data_access.get_training_data(
|
||||
chart_name=chart_name,
|
||||
annotation_source="human"
|
||||
)
|
||||
print(f" ✓ Successfully retrieved training data")
|
||||
print(f" - Candles: {len(candles_df)} rows")
|
||||
print(f" - Annotations: {len(annotations_df)} rows")
|
||||
except Exception as e:
|
||||
print(f" ✗ Error: {e}")
|
||||
return False
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
print("✓ All database access tests passed!")
|
||||
print("=" * 60)
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n✗ Error during testing: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return False
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
success = test_db_access()
|
||||
sys.exit(0 if success else 1)
|
||||
Loading…
Add table
Add a link
Reference in a new issue