Marko Djordjevic 847ff67986 feat(ml): add TA-Lib annotation generation and import workflow

Add complete workflow for using TA-Lib to bootstrap training data:

- generate_talib_annotations.py: Python script to run TA-Lib CDL* functions
  and output span annotations in UI-compatible format
- import_talib_annotations.ts: TypeScript script to import generated
  annotations into the UI database with auto-label-type creation
- npm script 'import-annotations' for easy execution
- TALIB_WORKFLOW.md: Comprehensive guide covering the full cycle:
  * Generate patterns with TA-Lib
  * Import into UI
  * Review and edit in browser
  * Export and train model
  * Compare predictions with TA-Lib detections
  * Iterate for improvement

This enables the intended workflow: use TA-Lib for initial annotations,
manually refine them, then train a model that learns from corrections.

2026-02-15 19:18:28 +01:00

11 KiB

Raw Blame History

Deployment Guide

Prerequisites

Node.js 18.x or higher
npm 9.x or higher
Python and build tools (for native module compilation)

Local Development Setup

1. Install Dependencies

npm install

Note: The better-sqlite3 package requires native compilation. If you encounter build errors, ensure you have the necessary build tools:

Linux:

sudo apt-get install build-essential python3

macOS:

xcode-select --install

Windows:

npm install --global windows-build-tools

2. Database Setup

The SQLite database will be automatically created when you start the application. The database file is located at:

./data/candles.db

To run migrations manually:

npx drizzle-kit generate
npx drizzle-kit migrate

3. Start Development Server

npm run dev

The application will be available at:

http://localhost:3000

4. Verify Setup

Open the application in your browser
Upload a sample CSV file with OHLC data (columns: time, open, high, low, close)
Verify the candlestick chart renders correctly
Test annotation tools (Break Up, Break Down, Draw Line, Delete)
Export annotations as CSV

CSV File Format

The application expects CSV files with the following format:

time,open,high,low,close
1700000000,1.0500,1.0520,1.0490,1.0510
1700000060,1.0510,1.0530,1.0505,1.0525

Time column formats:

Unix timestamp (seconds): 1700000000
Date string: 2024-01-15

Building for Production

npm run build

Note: Production builds with better-sqlite3 require the native module to be compiled for the target platform.

Running Production Build

npm run build
npm start

The production server will run on port 3000 by default.

Troubleshooting

better-sqlite3 Build Issues

If you encounter errors related to better-sqlite3 not finding bindings:

Rebuild the module:
```
npm rebuild better-sqlite3
```

If that fails, reinstall:

npm uninstall better-sqlite3
npm install better-sqlite3

For development, you can use npm run dev which handles the module better than production builds.

Database Issues

If the database becomes corrupted or you want to start fresh:

Stop the application
Delete the database file:
```
rm -f data/candles.db
```
Restart the application (it will recreate the database)

Port Already in Use

If port 3000 is already in use, you can specify a different port:

PORT=3001 npm run dev

Environment Variables

The application doesn't require any environment variables for local development. All configuration is hardcoded for simplicity.

File Structure

.
├── src/
│   ├── app/              # Next.js app router
│   │   ├── api/          # API routes
│   │   ├── layout.tsx    # Root layout
│   │   └── page.tsx      # Main page
│   ├── components/       # React components
│   │   ├── CandleChart.tsx
│   │   ├── SvgOverlay.tsx
│   │   ├── Toolbox.tsx
│   │   └── FileUpload.tsx
│   └── lib/              # Utilities
│       └── db/           # Database configuration
├── data/                 # SQLite database directory
├── drizzle/              # Database migrations
└── public/               # Static assets

ML Service Setup (Optional)

The Candle Annotator includes an optional Python ML service for pattern recognition and prediction.

Prerequisites for ML Service

Python 3.11+
TA-Lib C library
PostgreSQL 16

Local ML Service Setup

1. Install TA-Lib C Library

Linux (Debian/Ubuntu):

sudo apt-get update
sudo apt-get install libta-lib-dev

macOS:

brew install ta-lib

From Source:

wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz
tar -xzf ta-lib-0.4.0-src.tar.gz
cd ta-lib/
./configure --prefix=/usr
make
sudo make install

2. Install Python Dependencies

cd services/ml
uv sync
#pip install -r requirements.txt

3. Setup PostgreSQL

The ML service requires PostgreSQL for storing training run metadata:

# Create database
createdb ml_db

# Or using psql
psql -c "CREATE DATABASE ml_db;"

4. Initialize DVC

DVC is used for dataset versioning:

cd services/ml
dvc init #--subdir
dvc remote add -d local /path/to/dvc-storage

5. Run MLflow Tracking Server

MLflow tracks experiments and stores models:

mlflow server \
  --backend-store-uri ./mlruns \
  --default-artifact-root ./mlruns/artifacts \
  --host 0.0.0.0 \
  --port 5000

6. Configure Pipeline

Edit services/ml/config/pipeline.yaml to configure:

Feature engineering settings
Model hyperparameters
Data paths
MLflow experiment name

7. Start ML Service

cd services/ml
uvicorn app.main:app --host 0.0.0.0 --port 8001 --reload

The inference API will be available at http://localhost:8001

8. Configure Next.js App

Create .env.local in the project root:

INFERENCE_API_URL=http://localhost:8001
INFERENCE_API_TIMEOUT=30000
INFERENCE_BATCH_TIMEOUT=120000
NEXT_PUBLIC_PREDICTIONS_ENABLED=true

Running the ML Pipeline

The ML pipeline consists of:

Feature Engineering - Extract TA-Lib indicators from OHLCV data
Annotation Ingestion - Convert span annotations to labeled datasets
Training - Train models with MLflow tracking
Inference - Serve predictions via FastAPI

Train a Model

cd services/ml
python pipeline.py --config config/pipeline.yaml

This will:

Load raw OHLCV data from data/raw/
Compute features and save to data/enriched/
Load annotations and create labeled dataset in data/labeled/
Train the model with MLflow tracking
Save model artifacts

Run Individual Stages

# Feature engineering only
python pipeline.py --config config/pipeline.yaml --stage feature_engineering

# Training only (requires labeled data)
python pipeline.py --config config/pipeline.yaml --stage training

View Experiments

Open MLflow UI at http://localhost:5000

Test Inference API

# Check health
curl http://localhost:8001/health

# Get model info
curl http://localhost:8001/model/info

# Predict (requires candles JSON)
curl -X POST http://localhost:8001/predict \
  -H "Content-Type: application/json" \
  -d '{"candles": [...]}'

Docker Deployment

Prerequisites

Docker (20.10+)
docker-compose (2.0+)

Build and Run with Docker Compose

The easiest way to deploy is with docker-compose:

docker-compose up --build

This will:

Build the Next.js app and ML service Docker images
Start PostgreSQL for ML service metadata
Start MLflow tracking server
Start the ML inference service (FastAPI)
Start the Next.js web application
Create named volumes for persistent storage:
- candle-data - SQLite database for annotations
- ml-data - OHLCV data, features, labeled datasets
- mlflow-data - MLflow experiments and model artifacts
- postgres-data - PostgreSQL data
Enable automatic restart unless stopped

Services will be available at:

Web UI: http://localhost:3000
ML Inference API: http://localhost:8001
MLflow UI: http://localhost:5000
PostgreSQL: localhost:5432

Running in Detached Mode

docker-compose up -d --build

View logs:

docker-compose logs -f candle-annotator

Stop the service:

docker-compose down

Manual Docker Build and Run

If you prefer to build and run manually:

# Build image
docker build -t candle-annotator .

# Run container
docker run -d \
  -p 3000:3000 \
  -v candle-data:/app/data \
  --restart unless-stopped \
  candle-annotator

Environment Configuration

Create a .env file in the project root based on .env.example:

cp .env.example .env

Edit .env to customize:

NODE_ENV=production
PORT=3000
DATABASE_PATH=/app/data/candles.db

Pass environment variables to docker-compose:

docker-compose --env-file .env up -d

Data Persistence

The application stores the SQLite database in a Docker named volume candle-data. This ensures data persists across container restarts:

# View volumes
docker volume ls | grep candle

# Backup database
docker cp candle-annotator:/app/data/candles.db ./backup.db

# Restore database
docker cp ./backup.db candle-annotator:/app/data/candles.db

Accessing the Application

Once running, access the application at:

http://localhost:3000

Health check endpoint:

curl http://localhost:3000/api/health

With database check:

curl http://localhost:3000/api/health?check=db

Port Mapping

To run on a different port (e.g., 8080), modify docker-compose.yml:

services:
  candle-annotator:
    ports:
      - "8080:3000"

Or use environment variable in docker-compose:

services:
  candle-annotator:
    ports:
      - "${HOST_PORT:-3000}:3000"

Then run:

HOST_PORT=8080 docker-compose up -d

Container Health Checks

Docker automatically checks container health every 30 seconds using the /api/health endpoint. The container will restart if:

Health check fails 3 times consecutively
Takes longer than 3 seconds to respond

View health status:

docker ps

Look for the STATUS column - it should show healthy.

Troubleshooting

Port already in use:

docker-compose down  # Stop any existing containers
docker-compose up -d -p 8080:3000/tcp

Database permission errors:

# Ensure volume has correct permissions
docker-compose down
docker volume rm candle-data
docker-compose up --build

Rebuild without cache:

docker-compose build --no-cache
docker-compose up -d

View container logs:

docker-compose logs -f --tail=100

Update Procedure

To update the application:

git pull origin master
docker-compose down
docker-compose up --build -d

Or with no-cache rebuild:

git pull
docker-compose down
docker-compose build --no-cache
docker-compose up -d

Production Deployment

For production deployments, consider:

Use a container registry (Docker Hub, ECR, GCR):

docker tag candle-annotator myregistry/candle-annotator:v1.0.0
docker push myregistry/candle-annotator:v1.0.0

Run on a remote server (AWS, DigitalOcean, etc.):

# SSH into server, clone repo, then:
docker-compose up -d

Add reverse proxy (nginx, traefik) for HTTPS:

# docker-compose.yml
services:
  nginx:
    image: nginx:alpine
    ports:
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf

Enable Docker logging for production monitoring:

docker-compose logs -f --tail=1000 > app.log &

Notes

This application is designed for single-user local use only
There is no authentication or user management
The SQLite database is stored locally and not intended for concurrent access
For production multi-user deployments, consider migrating to PostgreSQL or similar
Docker deployment provides lightweight containerization ideal for standalone instances
The multi-stage Dockerfile keeps image size minimal (~100MB)

11 KiB Raw Blame History

Deployment Guide

Prerequisites

Local Development Setup

1. Install Dependencies

2. Database Setup

3. Start Development Server

4. Verify Setup

CSV File Format

Building for Production

Running Production Build

Troubleshooting

better-sqlite3 Build Issues

Database Issues

Port Already in Use

Environment Variables

File Structure

ML Service Setup (Optional)

Prerequisites for ML Service

Local ML Service Setup

1. Install TA-Lib C Library

2. Install Python Dependencies

3. Setup PostgreSQL

4. Initialize DVC

5. Run MLflow Tracking Server

6. Configure Pipeline

7. Start ML Service

8. Configure Next.js App

Running the ML Pipeline

Train a Model

Run Individual Stages

View Experiments

Test Inference API

Docker Deployment

Prerequisites

Build and Run with Docker Compose

Running in Detached Mode

Manual Docker Build and Run

Environment Configuration

Data Persistence

Accessing the Application

Port Mapping

Container Health Checks

Troubleshooting

Update Procedure

Production Deployment

Notes

11 KiB

Raw Blame History