candle-annotator/DEPLOYMENT.md
Marko Djordjevic 21f184aa8d feat(ui): implement disagreement detection, prediction summary, loading states, and update documentation
- Add disagreement detection logic comparing human annotations vs predictions
- Display prediction summary in PredictionPanel (agreements/disagreements)
- Wire up 'Show only disagreements' filter toggle
- Add loading overlay during prediction fetching
- Update docker-compose.yml with healthchecks for all services
- Update DEPLOYMENT.md with comprehensive ML service setup instructions
- Update README.md with ML pipeline overview and architecture diagrams
- Update CLAUDE_DESCRIPTION.md with v3.0.0 ML integration details

Remaining tasks (11.2, 11.4, 11.5) deferred - core functionality complete
2026-02-15 16:34:02 +01:00

557 lines
11 KiB
Markdown

# Deployment Guide
## Prerequisites
- Node.js 18.x or higher
- npm 9.x or higher
- Python and build tools (for native module compilation)
## Local Development Setup
### 1. Install Dependencies
```bash
npm install
```
Note: The `better-sqlite3` package requires native compilation. If you encounter build errors, ensure you have the necessary build tools:
**Linux:**
```bash
sudo apt-get install build-essential python3
```
**macOS:**
```bash
xcode-select --install
```
**Windows:**
```bash
npm install --global windows-build-tools
```
### 2. Database Setup
The SQLite database will be automatically created when you start the application. The database file is located at:
```
./data/candles.db
```
To run migrations manually:
```bash
npx drizzle-kit generate
npx drizzle-kit migrate
```
### 3. Start Development Server
```bash
npm run dev
```
The application will be available at:
- http://localhost:3000
### 4. Verify Setup
1. Open the application in your browser
2. Upload a sample CSV file with OHLC data (columns: time, open, high, low, close)
3. Verify the candlestick chart renders correctly
4. Test annotation tools (Break Up, Break Down, Draw Line, Delete)
5. Export annotations as CSV
## CSV File Format
The application expects CSV files with the following format:
```csv
time,open,high,low,close
1700000000,1.0500,1.0520,1.0490,1.0510
1700000060,1.0510,1.0530,1.0505,1.0525
```
**Time column formats:**
- Unix timestamp (seconds): `1700000000`
- Date string: `2024-01-15`
## Building for Production
```bash
npm run build
```
Note: Production builds with `better-sqlite3` require the native module to be compiled for the target platform.
## Running Production Build
```bash
npm run build
npm start
```
The production server will run on port 3000 by default.
## Troubleshooting
### better-sqlite3 Build Issues
If you encounter errors related to `better-sqlite3` not finding bindings:
1. Rebuild the module:
```bash
npm rebuild better-sqlite3
```
2. If that fails, reinstall:
```bash
npm uninstall better-sqlite3
npm install better-sqlite3
```
3. For development, you can use `npm run dev` which handles the module better than production builds.
### Database Issues
If the database becomes corrupted or you want to start fresh:
1. Stop the application
2. Delete the database file:
```bash
rm -f data/candles.db
```
3. Restart the application (it will recreate the database)
### Port Already in Use
If port 3000 is already in use, you can specify a different port:
```bash
PORT=3001 npm run dev
```
## Environment Variables
The application doesn't require any environment variables for local development. All configuration is hardcoded for simplicity.
## File Structure
```
.
├── src/
│ ├── app/ # Next.js app router
│ │ ├── api/ # API routes
│ │ ├── layout.tsx # Root layout
│ │ └── page.tsx # Main page
│ ├── components/ # React components
│ │ ├── CandleChart.tsx
│ │ ├── SvgOverlay.tsx
│ │ ├── Toolbox.tsx
│ │ └── FileUpload.tsx
│ └── lib/ # Utilities
│ └── db/ # Database configuration
├── data/ # SQLite database directory
├── drizzle/ # Database migrations
└── public/ # Static assets
```
## ML Service Setup (Optional)
The Candle Annotator includes an optional Python ML service for pattern recognition and prediction.
### Prerequisites for ML Service
- Python 3.11+
- TA-Lib C library
- PostgreSQL 16
### Local ML Service Setup
#### 1. Install TA-Lib C Library
**Linux (Debian/Ubuntu):**
```bash
sudo apt-get update
sudo apt-get install libta-lib-dev
```
**macOS:**
```bash
brew install ta-lib
```
**From Source:**
```bash
wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz
tar -xzf ta-lib-0.4.0-src.tar.gz
cd ta-lib/
./configure --prefix=/usr
make
sudo make install
```
#### 2. Install Python Dependencies
```bash
cd services/ml
pip install -r requirements.txt
```
#### 3. Setup PostgreSQL
The ML service requires PostgreSQL for storing training run metadata:
```bash
# Create database
createdb ml_db
# Or using psql
psql -c "CREATE DATABASE ml_db;"
```
#### 4. Initialize DVC
DVC is used for dataset versioning:
```bash
cd services/ml
dvc init
dvc remote add -d local /path/to/dvc-storage
```
#### 5. Run MLflow Tracking Server
MLflow tracks experiments and stores models:
```bash
mlflow server \
--backend-store-uri ./mlruns \
--default-artifact-root ./mlruns/artifacts \
--host 0.0.0.0 \
--port 5000
```
#### 6. Configure Pipeline
Edit `services/ml/config/pipeline.yaml` to configure:
- Feature engineering settings
- Model hyperparameters
- Data paths
- MLflow experiment name
#### 7. Start ML Service
```bash
cd services/ml
uvicorn app.main:app --host 0.0.0.0 --port 8001 --reload
```
The inference API will be available at http://localhost:8001
#### 8. Configure Next.js App
Create `.env.local` in the project root:
```env
INFERENCE_API_URL=http://localhost:8001
INFERENCE_API_TIMEOUT=30000
INFERENCE_BATCH_TIMEOUT=120000
NEXT_PUBLIC_PREDICTIONS_ENABLED=true
```
### Running the ML Pipeline
The ML pipeline consists of:
1. **Feature Engineering** - Extract TA-Lib indicators from OHLCV data
2. **Annotation Ingestion** - Convert span annotations to labeled datasets
3. **Training** - Train models with MLflow tracking
4. **Inference** - Serve predictions via FastAPI
#### Train a Model
```bash
cd services/ml
python pipeline.py --config config/pipeline.yaml
```
This will:
- Load raw OHLCV data from `data/raw/`
- Compute features and save to `data/enriched/`
- Load annotations and create labeled dataset in `data/labeled/`
- Train the model with MLflow tracking
- Save model artifacts
#### Run Individual Stages
```bash
# Feature engineering only
python pipeline.py --config config/pipeline.yaml --stage feature_engineering
# Training only (requires labeled data)
python pipeline.py --config config/pipeline.yaml --stage training
```
#### View Experiments
Open MLflow UI at http://localhost:5000
#### Test Inference API
```bash
# Check health
curl http://localhost:8001/health
# Get model info
curl http://localhost:8001/model/info
# Predict (requires candles JSON)
curl -X POST http://localhost:8001/predict \
-H "Content-Type: application/json" \
-d '{"candles": [...]}'
```
## Docker Deployment
### Prerequisites
- Docker (20.10+)
- docker-compose (2.0+)
### Build and Run with Docker Compose
The easiest way to deploy is with docker-compose:
```bash
docker-compose up --build
```
This will:
1. Build the Next.js app and ML service Docker images
2. Start PostgreSQL for ML service metadata
3. Start MLflow tracking server
4. Start the ML inference service (FastAPI)
5. Start the Next.js web application
6. Create named volumes for persistent storage:
- `candle-data` - SQLite database for annotations
- `ml-data` - OHLCV data, features, labeled datasets
- `mlflow-data` - MLflow experiments and model artifacts
- `postgres-data` - PostgreSQL data
7. Enable automatic restart unless stopped
Services will be available at:
- **Web UI**: http://localhost:3000
- **ML Inference API**: http://localhost:8001
- **MLflow UI**: http://localhost:5000
- **PostgreSQL**: localhost:5432
### Running in Detached Mode
```bash
docker-compose up -d --build
```
View logs:
```bash
docker-compose logs -f candle-annotator
```
Stop the service:
```bash
docker-compose down
```
### Manual Docker Build and Run
If you prefer to build and run manually:
```bash
# Build image
docker build -t candle-annotator .
# Run container
docker run -d \
-p 3000:3000 \
-v candle-data:/app/data \
--restart unless-stopped \
candle-annotator
```
### Environment Configuration
Create a `.env` file in the project root based on `.env.example`:
```bash
cp .env.example .env
```
Edit `.env` to customize:
```
NODE_ENV=production
PORT=3000
DATABASE_PATH=/app/data/candles.db
```
Pass environment variables to docker-compose:
```bash
docker-compose --env-file .env up -d
```
### Data Persistence
The application stores the SQLite database in a Docker named volume `candle-data`. This ensures data persists across container restarts:
```bash
# View volumes
docker volume ls | grep candle
# Backup database
docker cp candle-annotator:/app/data/candles.db ./backup.db
# Restore database
docker cp ./backup.db candle-annotator:/app/data/candles.db
```
### Accessing the Application
Once running, access the application at:
```
http://localhost:3000
```
Health check endpoint:
```bash
curl http://localhost:3000/api/health
```
With database check:
```bash
curl http://localhost:3000/api/health?check=db
```
### Port Mapping
To run on a different port (e.g., 8080), modify docker-compose.yml:
```yaml
services:
candle-annotator:
ports:
- "8080:3000"
```
Or use environment variable in docker-compose:
```yaml
services:
candle-annotator:
ports:
- "${HOST_PORT:-3000}:3000"
```
Then run:
```bash
HOST_PORT=8080 docker-compose up -d
```
### Container Health Checks
Docker automatically checks container health every 30 seconds using the `/api/health` endpoint. The container will restart if:
- Health check fails 3 times consecutively
- Takes longer than 3 seconds to respond
View health status:
```bash
docker ps
```
Look for the `STATUS` column - it should show `healthy`.
### Troubleshooting
**Port already in use:**
```bash
docker-compose down # Stop any existing containers
docker-compose up -d -p 8080:3000/tcp
```
**Database permission errors:**
```bash
# Ensure volume has correct permissions
docker-compose down
docker volume rm candle-data
docker-compose up --build
```
**Rebuild without cache:**
```bash
docker-compose build --no-cache
docker-compose up -d
```
**View container logs:**
```bash
docker-compose logs -f --tail=100
```
### Update Procedure
To update the application:
```bash
git pull origin master
docker-compose down
docker-compose up --build -d
```
Or with no-cache rebuild:
```bash
git pull
docker-compose down
docker-compose build --no-cache
docker-compose up -d
```
### Production Deployment
For production deployments, consider:
1. **Use a container registry** (Docker Hub, ECR, GCR):
```bash
docker tag candle-annotator myregistry/candle-annotator:v1.0.0
docker push myregistry/candle-annotator:v1.0.0
```
2. **Run on a remote server** (AWS, DigitalOcean, etc.):
```bash
# SSH into server, clone repo, then:
docker-compose up -d
```
3. **Add reverse proxy** (nginx, traefik) for HTTPS:
```yaml
# docker-compose.yml
services:
nginx:
image: nginx:alpine
ports:
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
```
4. **Enable Docker logging** for production monitoring:
```bash
docker-compose logs -f --tail=1000 > app.log &
```
## Notes
- This application is designed for **single-user local use** only
- There is no authentication or user management
- The SQLite database is stored locally and not intended for concurrent access
- For production multi-user deployments, consider migrating to PostgreSQL or similar
- Docker deployment provides lightweight containerization ideal for standalone instances
- The multi-stage Dockerfile keeps image size minimal (~100MB)