Candle annotator
Find a file
Marko Djordjevic d75b05b585 feat: add candle time-sort and duplicate timestamp validation to POST /predict
Sort incoming candles by their time field in ascending chronological order
before processing to ensure correct feature engineering regardless of input
order. Also validate that no duplicate timestamps are present, returning
HTTP 400 with details if duplicates are found.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-18 11:28:02 +01:00
.agents fix: remove all SQLite references (migrate.ts, migration script, package.json) 2026-02-17 23:34:12 +01:00
.cursor/skills fix: remove all SQLite references (migrate.ts, migration script, package.json) 2026-02-17 23:34:12 +01:00
.github/workflows use github action runners 2026-02-13 09:19:08 +01:00
.qwen/skills fix: remove all SQLite references (migrate.ts, migration script, package.json) 2026-02-17 23:34:12 +01:00
drizzle feat: migrate from SQLite to PostgreSQL - complete schema and API updates 2026-02-17 13:43:06 +01:00
lovable_design_html feat: redesign UI to match lovable compact sidebar layout 2026-02-16 20:50:30 +01:00
models feat: add SHA256 model integrity check before joblib.load() 2026-02-18 11:25:14 +01:00
openspec feat: add candle time-sort and duplicate timestamp validation to POST /predict 2026-02-18 11:28:02 +01:00
public fix: create public directory and simplify Dockerfile COPY logic 2026-02-12 15:21:01 +01:00
RAZNO feat: auto-build training dataset from DB annotations before training 2026-02-18 00:24:39 +01:00
scripts fix: replace all SQLite references with PostgreSQL in scripts and lazy-init db connection 2026-02-17 23:48:47 +01:00
services/ml feat: add candle time-sort and duplicate timestamp validation to POST /predict 2026-02-18 11:28:02 +01:00
src fix: add response.ok checks before .json() in CandleChart.tsx 2026-02-18 11:21:54 +01:00
.dockerignore feat: implement label management with sidebar, hacker theme, and Docker support 2026-02-12 15:12:59 +01:00
.env.example feat: add API_KEY to .env.example with placeholder and instructions 2026-02-18 11:06:47 +01:00
.eslintrc.json feat: initialize Next.js project with database schema 2026-02-12 10:23:02 +01:00
.gitignore feat: add models/ and *.pkl to .gitignore, remove tracked model files from git history 2026-02-18 10:55:48 +01:00
candle_annotator.db feat: redesign UI to match lovable compact sidebar layout 2026-02-16 20:50:30 +01:00
candle_annotator@1.0.0 fix(ml): add CCI to hlc_indicators list 2026-02-15 21:08:20 +01:00
CLAUDE.md feat: migrate from SQLite to PostgreSQL - complete schema and API updates 2026-02-17 13:43:06 +01:00
CLAUDE_DESCRIPTION.md fix: resolve numpy type conversion issues in ML service data access 2026-02-17 14:10:21 +01:00
CODE_REVIEW.md sync: ml-ui-connection delta specs to main specs 2026-02-18 10:21:05 +01:00
components.json feat: initialize Next.js project with database schema 2026-02-12 10:23:02 +01:00
DEPLOYMENT.md fix: resolve numpy type conversion issues in ML service data access 2026-02-17 14:10:21 +01:00
docker-compose.yml chore: bind ML service port to 127.0.0.1:8001:8001 for localhost-only access 2026-02-18 10:58:31 +01:00
Dockerfile feat: add Python migration script and successfully test SQLite to PostgreSQL data migration 2026-02-17 14:01:21 +01:00
drizzle.config.ts feat: migrate from SQLite to PostgreSQL - complete schema and API updates 2026-02-17 13:43:06 +01:00
EURUSD.csv add for docker until i fix it 2026-02-16 16:21:46 +01:00
inference-ui-prompt.md feat: add ML service scaffolding with Python FastAPI, Docker, and MLflow setup 2026-02-15 11:58:31 +01:00
ml-pipeline-prompt.md feat: add ML service scaffolding with Python FastAPI, Docker, and MLflow setup 2026-02-15 11:58:31 +01:00
ML_QUICKSTART.md docs: add ML pipeline quickstart guide for training first model 2026-02-15 19:08:09 +01:00
next-env.d.ts fix: correct timestamp/boolean types for PostgreSQL schema (Date not int, bool not 0/1) 2026-02-17 22:50:31 +01:00
next.config.js feat: implement label management with sidebar, hacker theme, and Docker support 2026-02-12 15:12:59 +01:00
package-lock.json feat: migrate from SQLite to PostgreSQL - complete schema and API updates 2026-02-17 13:43:06 +01:00
package.json fix: remove all SQLite references (migrate.ts, migration script, package.json) 2026-02-17 23:34:12 +01:00
postcss.config.js feat: initialize Next.js project with database schema 2026-02-12 10:23:02 +01:00
README.md fix: resolve numpy type conversion issues in ML service data access 2026-02-17 14:10:21 +01:00
tailwind.config.js feat: redesign UI to match lovable compact sidebar layout 2026-02-16 20:50:30 +01:00
tailwind.config.ts design: change to minimalistic and clean light theme 2026-02-12 16:18:14 +01:00
TALIB_WORKFLOW.md feat(ml): add TA-Lib annotation generation and import workflow 2026-02-15 19:18:28 +01:00
tsconfig.json fix: exclude scripts/ from TypeScript build to fix deployment 2026-02-17 23:29:52 +01:00
tsconfig.tsbuildinfo fix: make sidebar scrollable so training/predictions panels are always visible 2026-02-17 22:10:54 +01:00
tsx fix(ml): add CCI to hlc_indicators list 2026-02-15 21:08:20 +01:00

Candle Annotator

A web-based tool for manually annotating candlestick charts with pattern labels and trend lines. Built for creating labeled training data for machine learning models in trading analysis.

Overview

Candle Annotator is a complete machine learning platform for candlestick pattern recognition, combining:

Annotation Tools - TradingView-like charting interface for creating labeled training data:

  • Upload historical OHLC (Open, High, Low, Close) candle data from CSV files
  • Visualize candlestick charts with interactive zoom and pan
  • Annotate patterns with span labels (e.g., "Bullish Engulfing", "Doji", "Hammer")
  • Mark breakout patterns (Break Up, Break Down) directly on candles
  • Draw custom trend lines with two-click interaction
  • Export annotations for ML training

ML Pipeline - Python-based training and inference system:

  • Feature engineering with TA-Lib indicators (RSI, MACD, Bollinger Bands, etc.)
  • Automated pattern detection using TA-Lib CDL* functions
  • Train RandomForest and XGBoost models with MLflow experiment tracking
  • FastAPI inference service for real-time predictions
  • Integration with Next.js UI for prediction visualization

Active Learning Loop - Close the feedback cycle:

  • Model predictions displayed as overlays on the chart
  • Disagreement detection between human annotations and model predictions
  • One-click feedback to confirm, correct, or dismiss predictions as new training data
  • Continuous improvement through iterative annotation and retraining

Features

Data Management

  • CSV Upload: Import OHLC data with support for both Unix timestamps and date strings
    • Replace Mode: Uploading a new CSV deletes all old candles and replaces them with new data
    • Initial Data: Docker containers automatically load EURUSD.csv on first startup if database is empty
  • PostgreSQL Storage: All candle data and annotations stored in PostgreSQL database
  • Shared Database: Frontend and ML service use the same database for seamless data access
  • Data Persistence: Annotations and candles persist between sessions

Chart Visualization

  • Interactive Candlestick Chart: Powered by lightweight-charts library
  • Dark Theme: Eye-friendly slate color scheme
  • Zoom & Pan: Mouse wheel zoom and drag-to-pan functionality
  • Crosshair: Precise price and time tracking

Annotation Tools

  • Break Up Markers: Green arrow markers below candles indicating upward breakouts
  • Break Down Markers: Red arrow markers above candles indicating downward breakouts
  • Trend Lines: Two-click line drawing with real-time preview
  • Delete Tool: Remove any annotation (markers or lines) by clicking on them
  • Tool Toggle: Click tool button again to deactivate

Label Management

  • Label Sidebar: View all annotations in collapsible sidebar with:
    • Click Selection: Click markers on chart or in sidebar to select/highlight
    • Keyboard Delete: Press Delete or Backspace to remove selected label
    • Individual Delete: Delete button on each list item
    • Search: Search annotations by timestamp
    • Filter: Filter by Break Up, Break Down, or All types
    • Count Display: See how many Break Up vs Break Down markers exist
    • Visual Highlight: Selected markers highlighted with glow effect

UI Theme

  • Hacker Theme: Terminal-inspired dark aesthetic with:
    • Matrix green (#00ff41) on dark background (#0a0e0a)
    • Monospace font (JetBrains Mono) throughout
    • Glow effects on button hover and active states
    • Custom scrollbars styled to match theme
    • High contrast for accessibility

Export & Deployment

  • CSV Export: Download all annotations with timestamp, label type, and price data
  • ML-Ready Format: Structured data suitable for training ML models
  • Docker Deployment: One-command deployment with persistent data volume
  • Health Check: Built-in /api/health endpoint for monitoring

ML Pipeline Features (Optional)

The integrated ML pipeline provides:

Feature Engineering

  • TA-Lib Indicators: Automatic computation of 150+ technical indicators (RSI, MACD, Bollinger Bands, ATR, Stochastic, etc.)
  • Candle Features: Body size, wick ratios, gap detection, price ranges
  • Custom Features: Plugin system for domain-specific feature functions
  • NaN Handling: Automatic warmup period detection and cleanup

Annotation Ingestion

  • Windowed Classification: Extract fixed-size windows around each pattern for classification models
  • BIO Sequence Labeling: Begin-Inside-Outside encoding for sequence models (future LSTM/GRU support)
  • Programmatic Labels: TA-Lib CDL* pattern functions for auto-labeling (23+ candlestick patterns)
  • Label Merging: Human-priority, programmatic-priority, or both strategies
  • Dataset Statistics: Class distribution, label counts, human/programmatic agreement metrics

Model Training

  • Model Types: RandomForest and XGBoost with class balancing
  • Temporal Splitting: Train/val/test splits that respect time series order (no data leakage)
  • MLflow Integration: Automatic experiment tracking, hyperparameter logging, artifact storage
  • Model Registry: Versioned model storage with stage management (Production, Staging, Archived)
  • Evaluation Metrics: Accuracy, F1 (macro/weighted), per-class precision/recall/F1
  • Visualization: Confusion matrix, feature importance plots, classification reports

Inference Service

  • FastAPI REST API: High-performance inference with automatic OpenAPI docs
  • Preprocessing Parity: Loads pipeline config from MLflow to ensure training/inference consistency
  • Batch Processing: Efficient prediction for large time ranges
  • Span Grouping: Consecutive predictions merged into labeled spans with confidence scores
  • Model Metadata: Endpoint to query model version, metrics, and label configuration

Prediction UI

  • Chart Overlay: Predictions rendered as histogram series with label-specific colors
  • Confidence Filtering: Slider to hide low-confidence predictions
  • Label Filtering: Toggle visibility per pattern type with per-class F1 scores
  • Disagreement Detection: Automatic comparison of human vs model predictions
  • Prediction Summary: Counts for total predictions, agreements, disagreements
  • Active Learning Feedback: Click predictions to convert them to annotations (future feature)

Architecture

System Components

┌─────────────────────────────────────────────────────────────────┐
│                         Web Browser                              │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  Next.js Frontend (React 19, Tailwind, lightweight-charts)│  │
│  │  - Annotation tools                                        │  │
│  │  - Prediction visualization                                │  │
│  └──────────────────┬──────────────────────────────────────────┘  │
└─────────────────────┼──────────────────────────────────────────────┘
                      │ HTTP
                      ▼
┌─────────────────────────────────────────────────────────────────┐
│  Next.js API Routes (TypeScript)                                 │
│  - /api/candles, /api/annotations, /api/span-annotations        │
│  - /api/predict (proxy)                                          │
│  - /api/model/info (proxy)                                       │
│  └───────────┬─────────────────────────────────────┬───────────  │
│              │ PostgreSQL                          │ HTTP        │
│              ▼                                     ▼             │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │          PostgreSQL Database (Shared)                    │   │
│  │  - Frontend tables (candles, annotations, span_annotations) │
│  │  - ML tables (training_runs)                             │   │
│  │  - Accessed by: Next.js (Drizzle ORM)                    │   │
│  │                 ML Service (SQLAlchemy)                  │   │
│  └──────────────────────┬──────────────────────────────────┘    │
│                         │                                        │
│              ┌──────────┴──────────┐                             │
│              │                     │                             │
│  ┌───────────▼─────────┐  ┌───────▼───────────────────┐         │
│  │  ML Inference API   │  │    MLflow Server          │         │
│  │  (FastAPI, Python)  │  │  (Experiments, Registry)  │         │
│  └─────────────────────┘  └───────────────────────────┘         │
└─────────────────────────────────────────────────────────────────┘

ML Pipeline Workflow

1. Annotate Data (Web UI)
   ↓
2. Export Annotations (JSON)
   ↓
3. Feature Engineering (TA-Lib)
   ├─ Raw OHLCV → Enriched CSV (with indicators)
   ↓
4. Annotation Ingestion
   ├─ Annotations + Enriched CSV → Labeled Dataset
   ├─ Optional: TA-Lib CDL* auto-labeling
   ↓
5. Model Training
   ├─ Temporal train/val/test split
   ├─ RandomForest or XGBoost training
   ├─ MLflow experiment tracking
   ├─ Model registration
   ↓
6. Inference Service
   ├─ Load model from MLflow registry
   ├─ Serve predictions via FastAPI
   ↓
7. Prediction Visualization (Web UI)
   ├─ Display predictions on chart
   ├─ Detect disagreements
   ├─ Feedback loop: predictions → new annotations → retrain

Tech Stack

Frontend & Web Service

  • Frontend: Next.js 16 (App Router), React 19, TypeScript
  • Styling: Tailwind CSS 3, shadcn/ui components
  • Charting: lightweight-charts 4.x (TradingView)
  • Icons: lucide-react
  • Backend: Next.js API Routes
  • Database: PostgreSQL 16 with pg driver
  • ORM: Drizzle ORM (PostgreSQL dialect)
  • CSV Parsing: papaparse

ML Pipeline (Python)

  • API Framework: FastAPI with uvicorn
  • ML Libraries: scikit-learn (RandomForest), XGBoost
  • Feature Engineering: TA-Lib (Technical Analysis Library)
  • Data Processing: pandas, numpy
  • Experiment Tracking: MLflow (model registry, artifact storage)
  • Data Versioning: DVC (Data Version Control)
  • Database: PostgreSQL 16 (shared with frontend - reads candles/annotations, writes training runs)
  • ORM: SQLAlchemy (for training runs) + table reflection (for frontend data)
  • Model Persistence: joblib
  • Validation: Pydantic

Getting Started

The fastest way to get running with Docker:

docker-compose up --build

Then open http://localhost:3000

See DEPLOYMENT.md for detailed Docker instructions.

Prerequisites

  • Node.js 18.x or higher (for local development)
  • npm 9.x or higher (for local development)
  • PostgreSQL 16 or higher (for local development)
  • Docker & docker-compose (for containerized deployment)

Local Development Installation

  1. Clone the repository:

    git clone <repository-url>
    cd candle_annotator
    
  2. Install dependencies:

    npm install
    
  3. Setup PostgreSQL database:

    createdb candle_annotator
    createuser -P ml_user
    # Enter password: ml_password
    psql -c "GRANT ALL PRIVILEGES ON DATABASE candle_annotator TO ml_user;"
    
  4. Create .env file:

    cp .env.example .env
    # Edit .env to set DATABASE_URL=postgresql://ml_user:ml_password@localhost:5432/candle_annotator
    
  5. Start the development server:

    npm run dev
    
  6. Open http://localhost:3000 in your browser

Usage

  1. Upload Data: Click "Choose CSV File" and select a CSV with columns: time,open,high,low,close
  2. View Chart: The candlestick chart renders automatically after upload
  3. Add Annotations:
    • Click "Label: Break Up" or "Label: Break Down" then click on a candle
    • Click "Draw Line" then click two points to draw a trend line
    • Press Escape to cancel line drawing
  4. Delete Annotations: Click "Delete" tool, then click on markers or lines to remove them
  5. Export: Click "Export CSV" to download all annotations

CSV File Format

Input Format

Your CSV file should have these columns:

time,open,high,low,close
1700000000,1.0500,1.0520,1.0490,1.0510
1700000060,1.0510,1.0530,1.0505,1.0525

Time column accepts:

  • Unix timestamps (seconds): 1700000000
  • Date strings: 2024-01-15, 2024-01-15 10:30:00

Export Format

The exported CSV includes:

timestamp,label_type,price
1700000000,break_up,1.0510
1700000120,break_down,1.0505
1700000000,line,1.0500
  • timestamp: Unix timestamp of the annotation
  • label_type: break_up, break_down, or line
  • price: Close price for markers, start price for lines

Database Schema

Candles Table (PostgreSQL)

{
  id: serial (PK, auto-increment),
  chart_id: integer (FK to charts.id),
  time: timestamp (not null, indexed with chart_id),
  open: double precision,
  high: double precision,
  low: double precision,
  close: double precision
}

Annotations Table (Point Annotations)

{
  id: serial (PK, auto-increment),
  chart_id: integer (FK to charts.id),
  timestamp: timestamp (not null),
  label_type: text ('line' | 'rectangle'),
  geometry: jsonb (for line/rectangle coordinates, nullable),
  color: text (default '#3b82f6'),
  created_at: timestamp (default now())
}

Span Annotations Table (Pattern Labels)

{
  id: serial (PK, auto-increment),
  chart_id: integer (FK to charts.id),
  start_time: timestamp (not null),
  end_time: timestamp (not null),
  label: text (pattern name, e.g., 'Bullish Engulfing'),
  confidence: integer (nullable),
  outcome: text (nullable),
  notes: text (nullable),
  sub_spans: jsonb (nullable),
  color: text (default '#2196F3'),
  source: text (default 'human'),  # 'human' | 'model' | 'hybrid'
  model_prediction: jsonb (nullable),
  created_at: timestamp (default now())
}

Training Runs Table (ML Service)

{
  id: serial (PK, auto-increment),
  run_id: text (unique, MLflow run ID),
  model_type: text (e.g., 'RandomForest', 'XGBoost'),
  experiment_name: text,
  pipeline_config_hash: text,
  dataset_version: text,
  metrics_summary: jsonb,
  status: text (e.g., 'running', 'completed', 'failed'),
  created_at: timestamp (default now()),
  completed_at: timestamp (nullable)
}

API Endpoints

POST /api/upload

Upload CSV file and store candle data

Behavior: Deletes all existing candles before inserting new data (replace mode)
Request: multipart/form-data with file field
Response: { success: true, count: number } or { error: string }

GET /api/candles

Retrieve all candle records

Response: Array of candle objects ordered by time

GET /api/annotations

Retrieve all annotations

Response: Array of annotation objects with parsed geometry

POST /api/annotations

Create a new annotation

Request: { timestamp: number, label_type: string, geometry?: object }
Response: Created annotation object with ID

DELETE /api/annotations/[id]

Delete an annotation by ID

Response: { success: true } or { error: string }

GET /api/export

Export annotations as downloadable CSV

Response: CSV file download with Content-Disposition header

Architecture

Component Structure

  • page.tsx: Main page composition, manages active tool state
  • Toolbox.tsx: Sidebar with tool buttons and export functionality
  • FileUpload.tsx: CSV upload component with status messages
  • CandleChart.tsx: Core chart wrapper with lightweight-charts integration
    • Initializes chart with dark theme
    • Handles marker annotations (Break Up/Down)
    • Manages click events for annotation creation
    • Exposes refreshData() method for parent updates
  • SvgOverlay.tsx: Transparent SVG layer for line drawing
    • Coordinate transformation between data and pixels
    • Two-click line drawing with preview
    • Line hit detection for deletion

Data Flow

  1. User uploads CSV → POST /api/upload → SQLite storage
  2. Chart mounts → GET /api/candles + GET /api/annotations → Render
  3. User clicks with active tool → POST /api/annotations → Refresh chart
  4. User deletes → DELETE /api/annotations/[id] → Refresh chart
  5. User exports → GET /api/export → CSV download

Development

Project Structure

candle_annotator/
├── src/
│   ├── app/
│   │   ├── api/              # API route handlers
│   │   │   ├── upload/
│   │   │   ├── candles/
│   │   │   ├── annotations/
│   │   │   └── export/
│   │   ├── globals.css       # Tailwind styles
│   │   ├── layout.tsx        # Root layout with dark theme
│   │   └── page.tsx          # Main page
│   ├── components/
│   │   ├── ui/               # shadcn/ui components
│   │   ├── CandleChart.tsx
│   │   ├── SvgOverlay.tsx
│   │   ├── Toolbox.tsx
│   │   └── FileUpload.tsx
│   └── lib/
│       ├── db/
│       │   ├── index.ts      # Drizzle client
│       │   ├── schema.ts     # Table definitions
│       │   └── migrate.ts    # Migration runner
│       └── utils.ts          # Utility functions
├── data/                     # SQLite database directory
├── drizzle/                  # Migration files
├── DEPLOYMENT.md             # Deployment instructions
└── README.md                 # This file

Key Technical Decisions

  1. lightweight-charts v4: Stable API with good candlestick and marker support
  2. PostgreSQL: Shared database enables ML service to directly query candle/annotation data without CSV exports
  3. SVG Overlay for Lines: Maintains separate rendering layer from chart, easier coordinate management
  4. Drizzle ORM: Type-safe queries with minimal overhead, PostgreSQL dialect for production-grade features
  5. Next.js App Router: Server-side API routes co-located with frontend code

Known Limitations

  • Single User: No authentication or concurrent access support
  • No Undo: Can only delete annotations, not undo placement
  • Memory: Large CSV files (100k+ rows) may cause slow uploads
  • Line Snapping: Lines don't snap to candles, free-form placement only

Troubleshooting

See DEPLOYMENT.md for detailed troubleshooting steps.

Common issues:

  • PostgreSQL connection errors: Check DATABASE_URL environment variable and verify PostgreSQL is running
  • Port 3000 in use: Use PORT=3001 npm run dev
  • Migration errors: Ensure PostgreSQL is accessible before starting the application

License

ISC

Contributing

This is a focused tool for a specific use case. For questions or issues, please open a GitHub issue.