Candle annotator
Find a file
Marko Djordjevic bfe437857b feat: add Python migration script and successfully test SQLite to PostgreSQL data migration
- Created scripts/migrate-sqlite-to-postgres.py as alternative to TypeScript version
- Handles all type conversions: timestamps, booleans, and JSONB fields
- Successfully migrated all 2,836 rows from SQLite to PostgreSQL
- Verified data integrity: all 6 tables migrated correctly
- Charts: 1, Candles: 2,592, Annotations: 4, Span annotations: 223
2026-02-17 14:01:21 +01:00
.github/workflows use github action runners 2026-02-13 09:19:08 +01:00
drizzle feat: migrate from SQLite to PostgreSQL - complete schema and API updates 2026-02-17 13:43:06 +01:00
lovable_design_html feat: redesign UI to match lovable compact sidebar layout 2026-02-16 20:50:30 +01:00
openspec feat: add Python migration script and successfully test SQLite to PostgreSQL data migration 2026-02-17 14:01:21 +01:00
public fix: create public directory and simplify Dockerfile COPY logic 2026-02-12 15:21:01 +01:00
scripts feat: add Python migration script and successfully test SQLite to PostgreSQL data migration 2026-02-17 14:01:21 +01:00
services/ml feat: add Python migration script and successfully test SQLite to PostgreSQL data migration 2026-02-17 14:01:21 +01:00
src feat: add Python migration script and successfully test SQLite to PostgreSQL data migration 2026-02-17 14:01:21 +01:00
.dockerignore feat: implement label management with sidebar, hacker theme, and Docker support 2026-02-12 15:12:59 +01:00
.env feat(ml): add TA-Lib annotation generation and import workflow 2026-02-15 19:18:28 +01:00
.env.example feat: migrate from SQLite to PostgreSQL - complete schema and API updates 2026-02-17 13:43:06 +01:00
.eslintrc.json feat: initialize Next.js project with database schema 2026-02-12 10:23:02 +01:00
.gitignore fix: skip migrations during build phase to prevent 'table already exists' errors 2026-02-16 19:41:22 +01:00
candle_annotator.db feat: redesign UI to match lovable compact sidebar layout 2026-02-16 20:50:30 +01:00
candle_annotator@1.0.0 fix(ml): add CCI to hlc_indicators list 2026-02-15 21:08:20 +01:00
CLAUDE.md feat: migrate from SQLite to PostgreSQL - complete schema and API updates 2026-02-17 13:43:06 +01:00
CLAUDE_DESCRIPTION.md feat(ui): implement disagreement detection, prediction summary, loading states, and update documentation 2026-02-15 16:34:02 +01:00
components.json feat: initialize Next.js project with database schema 2026-02-12 10:23:02 +01:00
DEPLOYMENT.md fix: skip migrations during build phase to prevent 'table already exists' errors 2026-02-16 19:41:22 +01:00
docker-compose.yml feat: add Python migration script and successfully test SQLite to PostgreSQL data migration 2026-02-17 14:01:21 +01:00
Dockerfile feat: add Python migration script and successfully test SQLite to PostgreSQL data migration 2026-02-17 14:01:21 +01:00
drizzle.config.ts feat: migrate from SQLite to PostgreSQL - complete schema and API updates 2026-02-17 13:43:06 +01:00
EURUSD.csv add for docker until i fix it 2026-02-16 16:21:46 +01:00
inference-ui-prompt.md feat: add ML service scaffolding with Python FastAPI, Docker, and MLflow setup 2026-02-15 11:58:31 +01:00
ml-pipeline-prompt.md feat: add ML service scaffolding with Python FastAPI, Docker, and MLflow setup 2026-02-15 11:58:31 +01:00
ML_QUICKSTART.md docs: add ML pipeline quickstart guide for training first model 2026-02-15 19:08:09 +01:00
next-env.d.ts fix: skip migrations during build phase to prevent 'table already exists' errors 2026-02-16 19:41:22 +01:00
next.config.js feat: implement label management with sidebar, hacker theme, and Docker support 2026-02-12 15:12:59 +01:00
package-lock.json feat: migrate from SQLite to PostgreSQL - complete schema and API updates 2026-02-17 13:43:06 +01:00
package.json feat: add Python migration script and successfully test SQLite to PostgreSQL data migration 2026-02-17 14:01:21 +01:00
postcss.config.js feat: initialize Next.js project with database schema 2026-02-12 10:23:02 +01:00
README.md feat(ui): implement disagreement detection, prediction summary, loading states, and update documentation 2026-02-15 16:34:02 +01:00
tailwind.config.js feat: redesign UI to match lovable compact sidebar layout 2026-02-16 20:50:30 +01:00
tailwind.config.ts design: change to minimalistic and clean light theme 2026-02-12 16:18:14 +01:00
TALIB_WORKFLOW.md feat(ml): add TA-Lib annotation generation and import workflow 2026-02-15 19:18:28 +01:00
tsconfig.json feat: complete candle annotator implementation 2026-02-12 11:20:29 +01:00
tsconfig.tsbuildinfo feat(ui): add prediction chart rendering with histogram overlay, markers, filtering, and visibility toggle 2026-02-15 16:26:17 +01:00
tsx fix(ml): add CCI to hlc_indicators list 2026-02-15 21:08:20 +01:00

Candle Annotator

A web-based tool for manually annotating candlestick charts with pattern labels and trend lines. Built for creating labeled training data for machine learning models in trading analysis.

Overview

Candle Annotator is a complete machine learning platform for candlestick pattern recognition, combining:

Annotation Tools - TradingView-like charting interface for creating labeled training data:

  • Upload historical OHLC (Open, High, Low, Close) candle data from CSV files
  • Visualize candlestick charts with interactive zoom and pan
  • Annotate patterns with span labels (e.g., "Bullish Engulfing", "Doji", "Hammer")
  • Mark breakout patterns (Break Up, Break Down) directly on candles
  • Draw custom trend lines with two-click interaction
  • Export annotations for ML training

ML Pipeline - Python-based training and inference system:

  • Feature engineering with TA-Lib indicators (RSI, MACD, Bollinger Bands, etc.)
  • Automated pattern detection using TA-Lib CDL* functions
  • Train RandomForest and XGBoost models with MLflow experiment tracking
  • FastAPI inference service for real-time predictions
  • Integration with Next.js UI for prediction visualization

Active Learning Loop - Close the feedback cycle:

  • Model predictions displayed as overlays on the chart
  • Disagreement detection between human annotations and model predictions
  • One-click feedback to confirm, correct, or dismiss predictions as new training data
  • Continuous improvement through iterative annotation and retraining

Features

Data Management

  • CSV Upload: Import OHLC data with support for both Unix timestamps and date strings
    • Replace Mode: Uploading a new CSV deletes all old candles and replaces them with new data
    • Initial Data: Docker containers automatically load EURUSD.csv on first startup if database is empty
  • SQLite Storage: All candle data and annotations stored locally in SQLite database
  • Data Persistence: Annotations and candles persist between sessions

Chart Visualization

  • Interactive Candlestick Chart: Powered by lightweight-charts library
  • Dark Theme: Eye-friendly slate color scheme
  • Zoom & Pan: Mouse wheel zoom and drag-to-pan functionality
  • Crosshair: Precise price and time tracking

Annotation Tools

  • Break Up Markers: Green arrow markers below candles indicating upward breakouts
  • Break Down Markers: Red arrow markers above candles indicating downward breakouts
  • Trend Lines: Two-click line drawing with real-time preview
  • Delete Tool: Remove any annotation (markers or lines) by clicking on them
  • Tool Toggle: Click tool button again to deactivate

Label Management

  • Label Sidebar: View all annotations in collapsible sidebar with:
    • Click Selection: Click markers on chart or in sidebar to select/highlight
    • Keyboard Delete: Press Delete or Backspace to remove selected label
    • Individual Delete: Delete button on each list item
    • Search: Search annotations by timestamp
    • Filter: Filter by Break Up, Break Down, or All types
    • Count Display: See how many Break Up vs Break Down markers exist
    • Visual Highlight: Selected markers highlighted with glow effect

UI Theme

  • Hacker Theme: Terminal-inspired dark aesthetic with:
    • Matrix green (#00ff41) on dark background (#0a0e0a)
    • Monospace font (JetBrains Mono) throughout
    • Glow effects on button hover and active states
    • Custom scrollbars styled to match theme
    • High contrast for accessibility

Export & Deployment

  • CSV Export: Download all annotations with timestamp, label type, and price data
  • ML-Ready Format: Structured data suitable for training ML models
  • Docker Deployment: One-command deployment with persistent data volume
  • Health Check: Built-in /api/health endpoint for monitoring

ML Pipeline Features (Optional)

The integrated ML pipeline provides:

Feature Engineering

  • TA-Lib Indicators: Automatic computation of 150+ technical indicators (RSI, MACD, Bollinger Bands, ATR, Stochastic, etc.)
  • Candle Features: Body size, wick ratios, gap detection, price ranges
  • Custom Features: Plugin system for domain-specific feature functions
  • NaN Handling: Automatic warmup period detection and cleanup

Annotation Ingestion

  • Windowed Classification: Extract fixed-size windows around each pattern for classification models
  • BIO Sequence Labeling: Begin-Inside-Outside encoding for sequence models (future LSTM/GRU support)
  • Programmatic Labels: TA-Lib CDL* pattern functions for auto-labeling (23+ candlestick patterns)
  • Label Merging: Human-priority, programmatic-priority, or both strategies
  • Dataset Statistics: Class distribution, label counts, human/programmatic agreement metrics

Model Training

  • Model Types: RandomForest and XGBoost with class balancing
  • Temporal Splitting: Train/val/test splits that respect time series order (no data leakage)
  • MLflow Integration: Automatic experiment tracking, hyperparameter logging, artifact storage
  • Model Registry: Versioned model storage with stage management (Production, Staging, Archived)
  • Evaluation Metrics: Accuracy, F1 (macro/weighted), per-class precision/recall/F1
  • Visualization: Confusion matrix, feature importance plots, classification reports

Inference Service

  • FastAPI REST API: High-performance inference with automatic OpenAPI docs
  • Preprocessing Parity: Loads pipeline config from MLflow to ensure training/inference consistency
  • Batch Processing: Efficient prediction for large time ranges
  • Span Grouping: Consecutive predictions merged into labeled spans with confidence scores
  • Model Metadata: Endpoint to query model version, metrics, and label configuration

Prediction UI

  • Chart Overlay: Predictions rendered as histogram series with label-specific colors
  • Confidence Filtering: Slider to hide low-confidence predictions
  • Label Filtering: Toggle visibility per pattern type with per-class F1 scores
  • Disagreement Detection: Automatic comparison of human vs model predictions
  • Prediction Summary: Counts for total predictions, agreements, disagreements
  • Active Learning Feedback: Click predictions to convert them to annotations (future feature)

Architecture

System Components

┌─────────────────────────────────────────────────────────────────┐
│                         Web Browser                              │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  Next.js Frontend (React 19, Tailwind, lightweight-charts)│  │
│  │  - Annotation tools                                        │  │
│  │  - Prediction visualization                                │  │
│  └──────────────────┬──────────────────────────────────────────┘  │
└─────────────────────┼──────────────────────────────────────────────┘
                      │ HTTP
                      ▼
┌─────────────────────────────────────────────────────────────────┐
│  Next.js API Routes (TypeScript)                                 │
│  - /api/candles, /api/annotations, /api/span-annotations        │
│  - /api/predict (proxy)                                          │
│  - /api/model/info (proxy)                                       │
│  └───────────┬─────────────────────────────────────┬───────────  │
│              │ SQLite (annotations)                │ HTTP        │
│              ▼                                     ▼             │
│  ┌──────────────────┐                 ┌──────────────────────┐  │
│  │  SQLite Database │                 │  ML Inference API    │  │
│  │  (Drizzle ORM)   │                 │  (FastAPI, Python)   │  │
│  └──────────────────┘                 └──────────┬───────────┘  │
└────────────────────────────────────────────────────┼──────────────┘
                                                     │
                      ┌──────────────────────────────┴───────────────┐
                      │                                              │
          ┌───────────▼─────────┐                    ┌──────────────▼────────┐
          │  MLflow Server      │                    │  PostgreSQL           │
          │  (Experiments,      │                    │  (Training run        │
          │   Model Registry)   │                    │   metadata)           │
          └─────────────────────┘                    └───────────────────────┘

ML Pipeline Workflow

1. Annotate Data (Web UI)
   ↓
2. Export Annotations (JSON)
   ↓
3. Feature Engineering (TA-Lib)
   ├─ Raw OHLCV → Enriched CSV (with indicators)
   ↓
4. Annotation Ingestion
   ├─ Annotations + Enriched CSV → Labeled Dataset
   ├─ Optional: TA-Lib CDL* auto-labeling
   ↓
5. Model Training
   ├─ Temporal train/val/test split
   ├─ RandomForest or XGBoost training
   ├─ MLflow experiment tracking
   ├─ Model registration
   ↓
6. Inference Service
   ├─ Load model from MLflow registry
   ├─ Serve predictions via FastAPI
   ↓
7. Prediction Visualization (Web UI)
   ├─ Display predictions on chart
   ├─ Detect disagreements
   ├─ Feedback loop: predictions → new annotations → retrain

Tech Stack

Frontend & Web Service

  • Frontend: Next.js 16 (App Router), React 19, TypeScript
  • Styling: Tailwind CSS 3, shadcn/ui components
  • Charting: lightweight-charts 4.x (TradingView)
  • Icons: lucide-react
  • Backend: Next.js API Routes
  • Database: SQLite with better-sqlite3
  • ORM: Drizzle ORM
  • CSV Parsing: papaparse

ML Pipeline (Python)

  • API Framework: FastAPI with uvicorn
  • ML Libraries: scikit-learn (RandomForest), XGBoost
  • Feature Engineering: TA-Lib (Technical Analysis Library)
  • Data Processing: pandas, numpy
  • Experiment Tracking: MLflow (model registry, artifact storage)
  • Data Versioning: DVC (Data Version Control)
  • Database: PostgreSQL 16 (training run metadata)
  • Model Persistence: joblib
  • Validation: Pydantic

Getting Started

The fastest way to get running with Docker:

docker-compose up --build

Then open http://localhost:3000

See DEPLOYMENT.md for detailed Docker instructions.

Prerequisites

  • Node.js 18.x or higher (for local development)
  • npm 9.x or higher (for local development)
  • Docker & docker-compose (for containerized deployment)
  • Build tools for native modules (see DEPLOYMENT.md)

Local Development Installation

  1. Clone the repository:

    git clone <repository-url>
    cd candle_annotator
    
  2. Install dependencies:

    npm install
    
  3. Start the development server:

    npm run dev
    
  4. Open http://localhost:3000 in your browser

Usage

  1. Upload Data: Click "Choose CSV File" and select a CSV with columns: time,open,high,low,close
  2. View Chart: The candlestick chart renders automatically after upload
  3. Add Annotations:
    • Click "Label: Break Up" or "Label: Break Down" then click on a candle
    • Click "Draw Line" then click two points to draw a trend line
    • Press Escape to cancel line drawing
  4. Delete Annotations: Click "Delete" tool, then click on markers or lines to remove them
  5. Export: Click "Export CSV" to download all annotations

CSV File Format

Input Format

Your CSV file should have these columns:

time,open,high,low,close
1700000000,1.0500,1.0520,1.0490,1.0510
1700000060,1.0510,1.0530,1.0505,1.0525

Time column accepts:

  • Unix timestamps (seconds): 1700000000
  • Date strings: 2024-01-15, 2024-01-15 10:30:00

Export Format

The exported CSV includes:

timestamp,label_type,price
1700000000,break_up,1.0510
1700000120,break_down,1.0505
1700000000,line,1.0500
  • timestamp: Unix timestamp of the annotation
  • label_type: break_up, break_down, or line
  • price: Close price for markers, start price for lines

Database Schema

Candles Table

{
  id: integer (PK, auto-increment),
  time: integer (Unix timestamp, unique),
  open: real,
  high: real,
  low: real,
  close: real
}

Annotations Table

{
  id: integer (PK, auto-increment),
  timestamp: integer (Unix timestamp),
  label_type: text ('break_up' | 'break_down' | 'line'),
  geometry: text (JSON string for line coordinates, null for markers),
  created_at: integer (Unix timestamp)
}

API Endpoints

POST /api/upload

Upload CSV file and store candle data

Behavior: Deletes all existing candles before inserting new data (replace mode)
Request: multipart/form-data with file field
Response: { success: true, count: number } or { error: string }

GET /api/candles

Retrieve all candle records

Response: Array of candle objects ordered by time

GET /api/annotations

Retrieve all annotations

Response: Array of annotation objects with parsed geometry

POST /api/annotations

Create a new annotation

Request: { timestamp: number, label_type: string, geometry?: object }
Response: Created annotation object with ID

DELETE /api/annotations/[id]

Delete an annotation by ID

Response: { success: true } or { error: string }

GET /api/export

Export annotations as downloadable CSV

Response: CSV file download with Content-Disposition header

Architecture

Component Structure

  • page.tsx: Main page composition, manages active tool state
  • Toolbox.tsx: Sidebar with tool buttons and export functionality
  • FileUpload.tsx: CSV upload component with status messages
  • CandleChart.tsx: Core chart wrapper with lightweight-charts integration
    • Initializes chart with dark theme
    • Handles marker annotations (Break Up/Down)
    • Manages click events for annotation creation
    • Exposes refreshData() method for parent updates
  • SvgOverlay.tsx: Transparent SVG layer for line drawing
    • Coordinate transformation between data and pixels
    • Two-click line drawing with preview
    • Line hit detection for deletion

Data Flow

  1. User uploads CSV → POST /api/upload → SQLite storage
  2. Chart mounts → GET /api/candles + GET /api/annotations → Render
  3. User clicks with active tool → POST /api/annotations → Refresh chart
  4. User deletes → DELETE /api/annotations/[id] → Refresh chart
  5. User exports → GET /api/export → CSV download

Development

Project Structure

candle_annotator/
├── src/
│   ├── app/
│   │   ├── api/              # API route handlers
│   │   │   ├── upload/
│   │   │   ├── candles/
│   │   │   ├── annotations/
│   │   │   └── export/
│   │   ├── globals.css       # Tailwind styles
│   │   ├── layout.tsx        # Root layout with dark theme
│   │   └── page.tsx          # Main page
│   ├── components/
│   │   ├── ui/               # shadcn/ui components
│   │   ├── CandleChart.tsx
│   │   ├── SvgOverlay.tsx
│   │   ├── Toolbox.tsx
│   │   └── FileUpload.tsx
│   └── lib/
│       ├── db/
│       │   ├── index.ts      # Drizzle client
│       │   ├── schema.ts     # Table definitions
│       │   └── migrate.ts    # Migration runner
│       └── utils.ts          # Utility functions
├── data/                     # SQLite database directory
├── drizzle/                  # Migration files
├── DEPLOYMENT.md             # Deployment instructions
└── README.md                 # This file

Key Technical Decisions

  1. lightweight-charts v4: Stable API with good candlestick and marker support
  2. SQLite with better-sqlite3: Synchronous access, perfect for single-user local apps
  3. SVG Overlay for Lines: Maintains separate rendering layer from chart, easier coordinate management
  4. Drizzle ORM: Type-safe queries with minimal overhead
  5. Next.js App Router: Server-side API routes co-located with frontend code

Known Limitations

  • Single User: No authentication or concurrent access support
  • No Undo: Can only delete annotations, not undo placement
  • Memory: Large CSV files (100k+ rows) may cause slow uploads
  • Line Snapping: Lines don't snap to candles, free-form placement only

Troubleshooting

See DEPLOYMENT.md for detailed troubleshooting steps.

Common issues:

  • better-sqlite3 binding errors: Run npm rebuild better-sqlite3
  • Port 3000 in use: Use PORT=3001 npm run dev
  • Database corruption: Delete data/candles.db and restart

License

ISC

Contributing

This is a focused tool for a specific use case. For questions or issues, please open a GitHub issue.