fix: resolve numpy type conversion issues in ML service data access

- Convert numpy.int64 to Python int before passing to SQLAlchemy queries
- Prevents psycopg2.ProgrammingError: can't adapt type 'numpy.int64'
- Applied to get_candles(), get_span_annotations(), and get_point_annotations()
- All ML service database access tests now passing successfully
This commit is contained in:
Marko Djordjevic 2026-02-17 14:10:21 +01:00
parent 5377431c9d
commit d1557a3846
6 changed files with 437 additions and 119 deletions

View file

@ -4,7 +4,7 @@
- Node.js 18.x or higher
- npm 9.x or higher
- Python and build tools (for native module compilation)
- PostgreSQL 16 or higher
## Local Development Setup
@ -14,32 +14,37 @@
npm install
```
Note: The `better-sqlite3` package requires native compilation. If you encounter build errors, ensure you have the necessary build tools:
**Linux:**
```bash
sudo apt-get install build-essential python3
```
**macOS:**
```bash
xcode-select --install
```
**Windows:**
```bash
npm install --global windows-build-tools
```
### 2. Database Setup
The SQLite database will be automatically created when you start the application. The database file is located at:
#### PostgreSQL Setup
```
./data/candles.db
The application uses PostgreSQL for all data storage. Set up the database:
```bash
# Create database
createdb candle_annotator
# Create user (if needed)
createuser -P ml_user
# Enter password: ml_password
# Grant privileges
psql -c "GRANT ALL PRIVILEGES ON DATABASE candle_annotator TO ml_user;"
```
To run migrations manually:
#### Environment Configuration
Create a `.env` file in the project root:
```env
DATABASE_URL=postgresql://ml_user:ml_password@localhost:5432/candle_annotator
NODE_ENV=development
PORT=3000
```
#### Run Migrations
Database migrations run automatically on application startup. To run manually:
```bash
npx drizzle-kit generate
@ -77,14 +82,30 @@ time,open,high,low,close
- Unix timestamp (seconds): `1700000000`
- Date string: `2024-01-15`
### 4. Migrating from SQLite (if applicable)
If you have existing data in an SQLite database from a previous version, use the migration script:
```bash
# Run the migration script
npm run migrate:sqlite-to-postgres
# Or with TypeScript directly
npx ts-node scripts/migrate-sqlite-to-postgres.ts
```
This script will:
- Read all data from the SQLite database (`data/candles.db`)
- Convert data types (timestamps, booleans, JSON→jsonb)
- Insert data into PostgreSQL
- Skip if run multiple times (idempotent)
## Building for Production
```bash
npm run build
```
Note: Production builds with `better-sqlite3` require the native module to be compiled for the target platform.
## Running Production Build
```bash
@ -96,33 +117,37 @@ The production server will run on port 3000 by default.
## Troubleshooting
### better-sqlite3 Build Issues
### Database Connection Issues
If you encounter errors related to `better-sqlite3` not finding bindings:
If the application fails to connect to PostgreSQL:
1. Rebuild the module:
1. Verify PostgreSQL is running:
```bash
npm rebuild better-sqlite3
pg_isready -h localhost -p 5432
```
2. If that fails, reinstall:
2. Check DATABASE_URL environment variable:
```bash
npm uninstall better-sqlite3
npm install better-sqlite3
echo $DATABASE_URL
```
3. For development, you can use `npm run dev` which handles the module better than production builds.
3. Verify credentials:
```bash
psql -U ml_user -d candle_annotator
```
### Database Issues
If the database becomes corrupted or you want to start fresh:
If you want to reset the database:
1. Stop the application
2. Delete the database file:
2. Drop and recreate the database:
```bash
rm -f data/candles.db
dropdb candle_annotator
createdb candle_annotator
psql -c "GRANT ALL PRIVILEGES ON DATABASE candle_annotator TO ml_user;"
```
3. Restart the application (it will recreate the database)
3. Restart the application (migrations will run automatically)
### Port Already in Use
@ -134,7 +159,18 @@ PORT=3001 npm run dev
## Environment Variables
The application doesn't require any environment variables for local development. All configuration is hardcoded for simplicity.
Required environment variables:
- `DATABASE_URL` - PostgreSQL connection string (e.g., `postgresql://ml_user:ml_password@localhost:5432/candle_annotator`)
- `NODE_ENV` - Environment (`development` or `production`)
- `PORT` - Server port (default: 3000)
Optional variables for ML inference:
- `INFERENCE_API_URL` - ML service endpoint (default: `http://localhost:8001`)
- `INFERENCE_API_TIMEOUT` - Request timeout in ms (default: 30000)
- `INFERENCE_BATCH_TIMEOUT` - Batch processing timeout in ms (default: 120000)
- `NEXT_PUBLIC_PREDICTIONS_ENABLED` - Enable predictions UI (default: true)
## File Structure
@ -202,15 +238,7 @@ uv sync
#### 3. Setup PostgreSQL
The ML service requires PostgreSQL for storing training run metadata:
```bash
# Create database
createdb ml_db
# Or using psql
psql -c "CREATE DATABASE ml_db;"
```
The ML service shares the same PostgreSQL database as the frontend (`candle_annotator`). If you've already set up the database in the main setup steps, you're all set. The ML service will use the same connection.
#### 4. Initialize DVC
@ -330,15 +358,14 @@ docker compose up --build
This will:
1. Build the Next.js app and ML service Docker images
2. Start PostgreSQL for ML service metadata
2. Start PostgreSQL (shared by frontend and ML service)
3. Start MLflow tracking server
4. Start the ML inference service (FastAPI)
5. Start the Next.js web application
6. Create named volumes for persistent storage:
- `candle-data` - SQLite database for annotations
- `ml-data` - OHLCV data, features, labeled datasets
- `mlflow-data` - MLflow experiments and model artifacts
- `postgres-data` - PostgreSQL data
- `postgres-data` - PostgreSQL data (all application tables)
7. Enable automatic restart unless stopped
Services will be available at:
@ -391,7 +418,7 @@ Edit `.env` to customize:
```
NODE_ENV=production
PORT=3000
DATABASE_PATH=/app/data/candles.db
DATABASE_URL=postgresql://ml_user:ml_password@postgres:5432/candle_annotator
```
Pass environment variables to docker-compose:
@ -402,19 +429,73 @@ docker-compose --env-file .env up -d
### Data Persistence
The application stores the SQLite database in a Docker named volume `candle-data`. This ensures data persists across container restarts:
The application stores all data in PostgreSQL using the Docker named volume `postgres-data`. This ensures data persists across container restarts:
```bash
# View volumes
docker volume ls | grep candle
docker volume ls | grep postgres
# Backup database
docker cp candle-annotator:/app/data/candles.db ./backup.db
docker exec candle_annotator-postgres-1 pg_dump -U ml_user candle_annotator > backup.sql
# Restore database
docker cp ./backup.db candle-annotator:/app/data/candles.db
cat backup.sql | docker exec -i candle_annotator-postgres-1 psql -U ml_user -d candle_annotator
```
### Data Migration from SQLite
If you're upgrading from a SQLite-based version, you need to migrate your data:
1. **Before upgrading**, backup your SQLite database:
```bash
docker cp candle_annotator-candle-annotator-1:/app/data/candles.db ./backup-sqlite.db
```
2. **Stop the old containers**:
```bash
docker compose down
```
3. **Pull the new version** and start services:
```bash
git pull origin master
docker compose up -d
```
4. **Run the migration script** from your host machine:
```bash
# Copy SQLite database to a location accessible to the script
cp backup-sqlite.db data/candles.db
# Run migration (requires ts-node and dependencies)
npm install
DATABASE_URL=postgresql://ml_user:ml_password@localhost:5432/candle_annotator \
npx ts-node scripts/migrate-sqlite-to-postgres.ts
```
**Rollback Procedure** (if migration fails):
1. Stop new containers:
```bash
docker compose down
```
2. Restore SQLite-based docker-compose.yml from git history:
```bash
git checkout HEAD~1 docker-compose.yml Dockerfile
```
3. Restore SQLite database:
```bash
mkdir -p data
cp backup-sqlite.db data/candles.db
```
4. Start old version:
```bash
docker compose up -d
```
### Accessing the Application
Once running, access the application at:
@ -478,12 +559,18 @@ docker-compose down # Stop any existing containers
docker-compose up -d -p 8080:3000/tcp
```
**Database permission errors:**
**Database connection errors:**
```bash
# Ensure volume has correct permissions
docker-compose down
docker volume rm candle-data
docker-compose up --build
# Check PostgreSQL logs
docker compose logs postgres
# Verify database exists
docker exec -it candle_annotator-postgres-1 psql -U ml_user -d candle_annotator -c "\dt"
# Recreate database if needed
docker compose down
docker volume rm candle_annotator_postgres-data
docker compose up --build
```
**Rebuild without cache:**
@ -505,14 +592,24 @@ If you encounter this issue:
1. Rebuild the ml-service: `docker compose build ml-service`
2. Restart services: `docker compose up -d`
**Migration errors during build:**
**Migration errors during startup:**
If you see Drizzle migration errors like "table `annotations` already exists" during the Docker build process, this means the database file from a previous build is being included. This was fixed in commit by skipping migrations during the build phase. The migrations now only run at runtime.
If you see Drizzle migration errors during container startup, check:
If you encounter this issue:
1. Ensure `data/` is in `.dockerignore`
2. Rebuild: `docker compose build --no-cache candle-annotator`
3. Restart: `docker compose up -d`
1. Ensure PostgreSQL is fully started and healthy:
```bash
docker compose ps postgres
```
2. Check migration logs:
```bash
docker compose logs candle-annotator | grep -i migration
```
3. If needed, run migrations manually:
```bash
docker exec -it candle_annotator-candle-annotator-1 npm run db:migrate
```
### Update Procedure
@ -569,7 +666,7 @@ For production deployments, consider:
- This application is designed for **single-user local use** only
- There is no authentication or user management
- The SQLite database is stored locally and not intended for concurrent access
- For production multi-user deployments, consider migrating to PostgreSQL or similar
- PostgreSQL is used for all application data (frontend and ML service)
- The shared database enables the ML service to directly query candle and annotation data
- Docker deployment provides lightweight containerization ideal for standalone instances
- The multi-stage Dockerfile keeps image size minimal (~100MB)