Candle annotator

Find a file

Marko Djordjevic d75b05b585 feat: add candle time-sort and duplicate timestamp validation to POST /predict Sort incoming candles by their time field in ascending chronological order before processing to ensure correct feature engineering regardless of input order. Also validate that no duplicate timestamps are present, returning HTTP 400 with details if duplicates are found. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>		2026-02-18 11:28:02 +01:00
.agents	fix: remove all SQLite references (migrate.ts, migration script, package.json)	2026-02-17 23:34:12 +01:00
.cursor/skills	fix: remove all SQLite references (migrate.ts, migration script, package.json)	2026-02-17 23:34:12 +01:00
.github/workflows	use github action runners	2026-02-13 09:19:08 +01:00
.qwen/skills	fix: remove all SQLite references (migrate.ts, migration script, package.json)	2026-02-17 23:34:12 +01:00
drizzle	feat: migrate from SQLite to PostgreSQL - complete schema and API updates	2026-02-17 13:43:06 +01:00
lovable_design_html	feat: redesign UI to match lovable compact sidebar layout	2026-02-16 20:50:30 +01:00
models	feat: add SHA256 model integrity check before joblib.load()	2026-02-18 11:25:14 +01:00
openspec	feat: add candle time-sort and duplicate timestamp validation to POST /predict	2026-02-18 11:28:02 +01:00
public	fix: create public directory and simplify Dockerfile COPY logic	2026-02-12 15:21:01 +01:00
RAZNO	feat: auto-build training dataset from DB annotations before training	2026-02-18 00:24:39 +01:00
scripts	fix: replace all SQLite references with PostgreSQL in scripts and lazy-init db connection	2026-02-17 23:48:47 +01:00
services/ml	feat: add candle time-sort and duplicate timestamp validation to POST /predict	2026-02-18 11:28:02 +01:00
src	fix: add response.ok checks before .json() in CandleChart.tsx	2026-02-18 11:21:54 +01:00
.dockerignore	feat: implement label management with sidebar, hacker theme, and Docker support	2026-02-12 15:12:59 +01:00
.env.example	feat: add API_KEY to .env.example with placeholder and instructions	2026-02-18 11:06:47 +01:00
.eslintrc.json	feat: initialize Next.js project with database schema	2026-02-12 10:23:02 +01:00
.gitignore	feat: add models/ and *.pkl to .gitignore, remove tracked model files from git history	2026-02-18 10:55:48 +01:00
candle_annotator.db	feat: redesign UI to match lovable compact sidebar layout	2026-02-16 20:50:30 +01:00
candle_annotator@1.0.0	fix(ml): add CCI to hlc_indicators list	2026-02-15 21:08:20 +01:00
CLAUDE.md	feat: migrate from SQLite to PostgreSQL - complete schema and API updates	2026-02-17 13:43:06 +01:00
CLAUDE_DESCRIPTION.md	fix: resolve numpy type conversion issues in ML service data access	2026-02-17 14:10:21 +01:00
CODE_REVIEW.md	sync: ml-ui-connection delta specs to main specs	2026-02-18 10:21:05 +01:00
components.json	feat: initialize Next.js project with database schema	2026-02-12 10:23:02 +01:00
DEPLOYMENT.md	fix: resolve numpy type conversion issues in ML service data access	2026-02-17 14:10:21 +01:00
docker-compose.yml	chore: bind ML service port to 127.0.0.1:8001:8001 for localhost-only access	2026-02-18 10:58:31 +01:00
Dockerfile	feat: add Python migration script and successfully test SQLite to PostgreSQL data migration	2026-02-17 14:01:21 +01:00
drizzle.config.ts	feat: migrate from SQLite to PostgreSQL - complete schema and API updates	2026-02-17 13:43:06 +01:00
EURUSD.csv	add for docker until i fix it	2026-02-16 16:21:46 +01:00
inference-ui-prompt.md	feat: add ML service scaffolding with Python FastAPI, Docker, and MLflow setup	2026-02-15 11:58:31 +01:00
ml-pipeline-prompt.md	feat: add ML service scaffolding with Python FastAPI, Docker, and MLflow setup	2026-02-15 11:58:31 +01:00
ML_QUICKSTART.md	docs: add ML pipeline quickstart guide for training first model	2026-02-15 19:08:09 +01:00
next-env.d.ts	fix: correct timestamp/boolean types for PostgreSQL schema (Date not int, bool not 0/1)	2026-02-17 22:50:31 +01:00
next.config.js	feat: implement label management with sidebar, hacker theme, and Docker support	2026-02-12 15:12:59 +01:00
package-lock.json	feat: migrate from SQLite to PostgreSQL - complete schema and API updates	2026-02-17 13:43:06 +01:00
package.json	fix: remove all SQLite references (migrate.ts, migration script, package.json)	2026-02-17 23:34:12 +01:00
postcss.config.js	feat: initialize Next.js project with database schema	2026-02-12 10:23:02 +01:00
README.md	fix: resolve numpy type conversion issues in ML service data access	2026-02-17 14:10:21 +01:00
tailwind.config.js	feat: redesign UI to match lovable compact sidebar layout	2026-02-16 20:50:30 +01:00
tailwind.config.ts	design: change to minimalistic and clean light theme	2026-02-12 16:18:14 +01:00
TALIB_WORKFLOW.md	feat(ml): add TA-Lib annotation generation and import workflow	2026-02-15 19:18:28 +01:00
tsconfig.json	fix: exclude scripts/ from TypeScript build to fix deployment	2026-02-17 23:29:52 +01:00
tsconfig.tsbuildinfo	fix: make sidebar scrollable so training/predictions panels are always visible	2026-02-17 22:10:54 +01:00
tsx	fix(ml): add CCI to hlc_indicators list	2026-02-15 21:08:20 +01:00

README.md

Candle Annotator

A web-based tool for manually annotating candlestick charts with pattern labels and trend lines. Built for creating labeled training data for machine learning models in trading analysis.

Overview

Candle Annotator is a complete machine learning platform for candlestick pattern recognition, combining:

Annotation Tools - TradingView-like charting interface for creating labeled training data:

Upload historical OHLC (Open, High, Low, Close) candle data from CSV files
Visualize candlestick charts with interactive zoom and pan
Annotate patterns with span labels (e.g., "Bullish Engulfing", "Doji", "Hammer")
Mark breakout patterns (Break Up, Break Down) directly on candles
Draw custom trend lines with two-click interaction
Export annotations for ML training

ML Pipeline - Python-based training and inference system:

Feature engineering with TA-Lib indicators (RSI, MACD, Bollinger Bands, etc.)
Automated pattern detection using TA-Lib CDL* functions
Train RandomForest and XGBoost models with MLflow experiment tracking
FastAPI inference service for real-time predictions
Integration with Next.js UI for prediction visualization

Active Learning Loop - Close the feedback cycle:

Model predictions displayed as overlays on the chart
Disagreement detection between human annotations and model predictions
One-click feedback to confirm, correct, or dismiss predictions as new training data
Continuous improvement through iterative annotation and retraining

Features

Data Management

CSV Upload: Import OHLC data with support for both Unix timestamps and date strings
- Replace Mode: Uploading a new CSV deletes all old candles and replaces them with new data
- Initial Data: Docker containers automatically load EURUSD.csv on first startup if database is empty
PostgreSQL Storage: All candle data and annotations stored in PostgreSQL database
Shared Database: Frontend and ML service use the same database for seamless data access
Data Persistence: Annotations and candles persist between sessions

Chart Visualization

Interactive Candlestick Chart: Powered by lightweight-charts library
Dark Theme: Eye-friendly slate color scheme
Zoom & Pan: Mouse wheel zoom and drag-to-pan functionality
Crosshair: Precise price and time tracking

Annotation Tools

Break Up Markers: Green arrow markers below candles indicating upward breakouts
Break Down Markers: Red arrow markers above candles indicating downward breakouts
Trend Lines: Two-click line drawing with real-time preview
Delete Tool: Remove any annotation (markers or lines) by clicking on them
Tool Toggle: Click tool button again to deactivate

Label Management

Label Sidebar: View all annotations in collapsible sidebar with:
- Click Selection: Click markers on chart or in sidebar to select/highlight
- Keyboard Delete: Press Delete or Backspace to remove selected label
- Individual Delete: Delete button on each list item
- Search: Search annotations by timestamp
- Filter: Filter by Break Up, Break Down, or All types
- Count Display: See how many Break Up vs Break Down markers exist
- Visual Highlight: Selected markers highlighted with glow effect

UI Theme

Hacker Theme: Terminal-inspired dark aesthetic with:
- Matrix green (#00ff41) on dark background (#0a0e0a)
- Monospace font (JetBrains Mono) throughout
- Glow effects on button hover and active states
- Custom scrollbars styled to match theme
- High contrast for accessibility

Export & Deployment

CSV Export: Download all annotations with timestamp, label type, and price data
ML-Ready Format: Structured data suitable for training ML models
Docker Deployment: One-command deployment with persistent data volume
Health Check: Built-in /api/health endpoint for monitoring

ML Pipeline Features (Optional)

The integrated ML pipeline provides:

Feature Engineering

TA-Lib Indicators: Automatic computation of 150+ technical indicators (RSI, MACD, Bollinger Bands, ATR, Stochastic, etc.)
Candle Features: Body size, wick ratios, gap detection, price ranges
Custom Features: Plugin system for domain-specific feature functions
NaN Handling: Automatic warmup period detection and cleanup

Annotation Ingestion

Windowed Classification: Extract fixed-size windows around each pattern for classification models
BIO Sequence Labeling: Begin-Inside-Outside encoding for sequence models (future LSTM/GRU support)
Programmatic Labels: TA-Lib CDL* pattern functions for auto-labeling (23+ candlestick patterns)
Label Merging: Human-priority, programmatic-priority, or both strategies
Dataset Statistics: Class distribution, label counts, human/programmatic agreement metrics

Model Training

Model Types: RandomForest and XGBoost with class balancing
Temporal Splitting: Train/val/test splits that respect time series order (no data leakage)
MLflow Integration: Automatic experiment tracking, hyperparameter logging, artifact storage
Model Registry: Versioned model storage with stage management (Production, Staging, Archived)
Evaluation Metrics: Accuracy, F1 (macro/weighted), per-class precision/recall/F1
Visualization: Confusion matrix, feature importance plots, classification reports

Inference Service

FastAPI REST API: High-performance inference with automatic OpenAPI docs
Preprocessing Parity: Loads pipeline config from MLflow to ensure training/inference consistency
Batch Processing: Efficient prediction for large time ranges
Span Grouping: Consecutive predictions merged into labeled spans with confidence scores
Model Metadata: Endpoint to query model version, metrics, and label configuration

Prediction UI

Chart Overlay: Predictions rendered as histogram series with label-specific colors
Confidence Filtering: Slider to hide low-confidence predictions
Label Filtering: Toggle visibility per pattern type with per-class F1 scores
Disagreement Detection: Automatic comparison of human vs model predictions
Prediction Summary: Counts for total predictions, agreements, disagreements
Active Learning Feedback: Click predictions to convert them to annotations (future feature)

Architecture

System Components

┌─────────────────────────────────────────────────────────────────┐
│                         Web Browser                              │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  Next.js Frontend (React 19, Tailwind, lightweight-charts)│  │
│  │  - Annotation tools                                        │  │
│  │  - Prediction visualization                                │  │
│  └──────────────────┬──────────────────────────────────────────┘  │
└─────────────────────┼──────────────────────────────────────────────┘
                      │ HTTP
                      ▼
┌─────────────────────────────────────────────────────────────────┐
│  Next.js API Routes (TypeScript)                                 │
│  - /api/candles, /api/annotations, /api/span-annotations        │
│  - /api/predict (proxy)                                          │
│  - /api/model/info (proxy)                                       │
│  └───────────┬─────────────────────────────────────┬───────────  │
│              │ PostgreSQL                          │ HTTP        │
│              ▼                                     ▼             │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │          PostgreSQL Database (Shared)                    │   │
│  │  - Frontend tables (candles, annotations, span_annotations) │
│  │  - ML tables (training_runs)                             │   │
│  │  - Accessed by: Next.js (Drizzle ORM)                    │   │
│  │                 ML Service (SQLAlchemy)                  │   │
│  └──────────────────────┬──────────────────────────────────┘    │
│                         │                                        │
│              ┌──────────┴──────────┐                             │
│              │                     │                             │
│  ┌───────────▼─────────┐  ┌───────▼───────────────────┐         │
│  │  ML Inference API   │  │    MLflow Server          │         │
│  │  (FastAPI, Python)  │  │  (Experiments, Registry)  │         │
│  └─────────────────────┘  └───────────────────────────┘         │
└─────────────────────────────────────────────────────────────────┘

ML Pipeline Workflow

1. Annotate Data (Web UI)
   ↓
2. Export Annotations (JSON)
   ↓
3. Feature Engineering (TA-Lib)
   ├─ Raw OHLCV → Enriched CSV (with indicators)
   ↓
4. Annotation Ingestion
   ├─ Annotations + Enriched CSV → Labeled Dataset
   ├─ Optional: TA-Lib CDL* auto-labeling
   ↓
5. Model Training
   ├─ Temporal train/val/test split
   ├─ RandomForest or XGBoost training
   ├─ MLflow experiment tracking
   ├─ Model registration
   ↓
6. Inference Service
   ├─ Load model from MLflow registry
   ├─ Serve predictions via FastAPI
   ↓
7. Prediction Visualization (Web UI)
   ├─ Display predictions on chart
   ├─ Detect disagreements
   ├─ Feedback loop: predictions → new annotations → retrain

Tech Stack

Frontend & Web Service

Frontend: Next.js 16 (App Router), React 19, TypeScript
Styling: Tailwind CSS 3, shadcn/ui components
Charting: lightweight-charts 4.x (TradingView)
Icons: lucide-react
Backend: Next.js API Routes
Database: PostgreSQL 16 with pg driver
ORM: Drizzle ORM (PostgreSQL dialect)
CSV Parsing: papaparse

ML Pipeline (Python)

API Framework: FastAPI with uvicorn
ML Libraries: scikit-learn (RandomForest), XGBoost
Feature Engineering: TA-Lib (Technical Analysis Library)
Data Processing: pandas, numpy
Experiment Tracking: MLflow (model registry, artifact storage)
Data Versioning: DVC (Data Version Control)
Database: PostgreSQL 16 (shared with frontend - reads candles/annotations, writes training runs)
ORM: SQLAlchemy (for training runs) + table reflection (for frontend data)
Model Persistence: joblib
Validation: Pydantic

Getting Started

Docker Quickstart (Recommended)

The fastest way to get running with Docker:

docker-compose up --build

Then open http://localhost:3000

See DEPLOYMENT.md for detailed Docker instructions.

Prerequisites

Node.js 18.x or higher (for local development)
npm 9.x or higher (for local development)
PostgreSQL 16 or higher (for local development)
Docker & docker-compose (for containerized deployment)

Local Development Installation

Clone the repository:

git clone <repository-url>
cd candle_annotator

Install dependencies:
```
npm install
```

Setup PostgreSQL database:

createdb candle_annotator
createuser -P ml_user
# Enter password: ml_password
psql -c "GRANT ALL PRIVILEGES ON DATABASE candle_annotator TO ml_user;"

Create .env file:

cp .env.example .env
# Edit .env to set DATABASE_URL=postgresql://ml_user:ml_password@localhost:5432/candle_annotator

Start the development server:
```
npm run dev
```
Open http://localhost:3000 in your browser

Usage

Upload Data: Click "Choose CSV File" and select a CSV with columns: time,open,high,low,close
View Chart: The candlestick chart renders automatically after upload
Add Annotations:
- Click "Label: Break Up" or "Label: Break Down" then click on a candle
- Click "Draw Line" then click two points to draw a trend line
- Press Escape to cancel line drawing
Delete Annotations: Click "Delete" tool, then click on markers or lines to remove them
Export: Click "Export CSV" to download all annotations

CSV File Format

Input Format

Your CSV file should have these columns:

time,open,high,low,close
1700000000,1.0500,1.0520,1.0490,1.0510
1700000060,1.0510,1.0530,1.0505,1.0525

Time column accepts:

Unix timestamps (seconds): 1700000000
Date strings: 2024-01-15, 2024-01-15 10:30:00

Export Format

The exported CSV includes:

timestamp,label_type,price
1700000000,break_up,1.0510
1700000120,break_down,1.0505
1700000000,line,1.0500

timestamp: Unix timestamp of the annotation
label_type: break_up, break_down, or line
price: Close price for markers, start price for lines

Database Schema

Candles Table (PostgreSQL)

{
  id: serial (PK, auto-increment),
  chart_id: integer (FK to charts.id),
  time: timestamp (not null, indexed with chart_id),
  open: double precision,
  high: double precision,
  low: double precision,
  close: double precision
}

Annotations Table (Point Annotations)

{
  id: serial (PK, auto-increment),
  chart_id: integer (FK to charts.id),
  timestamp: timestamp (not null),
  label_type: text ('line' | 'rectangle'),
  geometry: jsonb (for line/rectangle coordinates, nullable),
  color: text (default '#3b82f6'),
  created_at: timestamp (default now())
}

Span Annotations Table (Pattern Labels)

{
  id: serial (PK, auto-increment),
  chart_id: integer (FK to charts.id),
  start_time: timestamp (not null),
  end_time: timestamp (not null),
  label: text (pattern name, e.g., 'Bullish Engulfing'),
  confidence: integer (nullable),
  outcome: text (nullable),
  notes: text (nullable),
  sub_spans: jsonb (nullable),
  color: text (default '#2196F3'),
  source: text (default 'human'),  # 'human' | 'model' | 'hybrid'
  model_prediction: jsonb (nullable),
  created_at: timestamp (default now())
}

Training Runs Table (ML Service)

{
  id: serial (PK, auto-increment),
  run_id: text (unique, MLflow run ID),
  model_type: text (e.g., 'RandomForest', 'XGBoost'),
  experiment_name: text,
  pipeline_config_hash: text,
  dataset_version: text,
  metrics_summary: jsonb,
  status: text (e.g., 'running', 'completed', 'failed'),
  created_at: timestamp (default now()),
  completed_at: timestamp (nullable)
}

API Endpoints

POST /api/upload

Upload CSV file and store candle data

Behavior: Deletes all existing candles before inserting new data (replace mode)
Request: multipart/form-data with file field
Response: { success: true, count: number } or { error: string }

GET /api/candles

Retrieve all candle records

Response: Array of candle objects ordered by time

GET /api/annotations

Retrieve all annotations

Response: Array of annotation objects with parsed geometry

POST /api/annotations

Create a new annotation

Request: { timestamp: number, label_type: string, geometry?: object }
Response: Created annotation object with ID

DELETE /api/annotations/[id]

Delete an annotation by ID

Response: { success: true } or { error: string }

GET /api/export

Export annotations as downloadable CSV

Response: CSV file download with Content-Disposition header

Architecture

Component Structure

page.tsx: Main page composition, manages active tool state
Toolbox.tsx: Sidebar with tool buttons and export functionality
FileUpload.tsx: CSV upload component with status messages
CandleChart.tsx: Core chart wrapper with lightweight-charts integration
- Initializes chart with dark theme
- Handles marker annotations (Break Up/Down)
- Manages click events for annotation creation
- Exposes refreshData() method for parent updates
SvgOverlay.tsx: Transparent SVG layer for line drawing
- Coordinate transformation between data and pixels
- Two-click line drawing with preview
- Line hit detection for deletion

Data Flow

User uploads CSV → POST /api/upload → SQLite storage
Chart mounts → GET /api/candles + GET /api/annotations → Render
User clicks with active tool → POST /api/annotations → Refresh chart
User deletes → DELETE /api/annotations/[id] → Refresh chart
User exports → GET /api/export → CSV download

Development

Project Structure

candle_annotator/
├── src/
│   ├── app/
│   │   ├── api/              # API route handlers
│   │   │   ├── upload/
│   │   │   ├── candles/
│   │   │   ├── annotations/
│   │   │   └── export/
│   │   ├── globals.css       # Tailwind styles
│   │   ├── layout.tsx        # Root layout with dark theme
│   │   └── page.tsx          # Main page
│   ├── components/
│   │   ├── ui/               # shadcn/ui components
│   │   ├── CandleChart.tsx
│   │   ├── SvgOverlay.tsx
│   │   ├── Toolbox.tsx
│   │   └── FileUpload.tsx
│   └── lib/
│       ├── db/
│       │   ├── index.ts      # Drizzle client
│       │   ├── schema.ts     # Table definitions
│       │   └── migrate.ts    # Migration runner
│       └── utils.ts          # Utility functions
├── data/                     # SQLite database directory
├── drizzle/                  # Migration files
├── DEPLOYMENT.md             # Deployment instructions
└── README.md                 # This file

Key Technical Decisions

lightweight-charts v4: Stable API with good candlestick and marker support
PostgreSQL: Shared database enables ML service to directly query candle/annotation data without CSV exports
SVG Overlay for Lines: Maintains separate rendering layer from chart, easier coordinate management
Drizzle ORM: Type-safe queries with minimal overhead, PostgreSQL dialect for production-grade features
Next.js App Router: Server-side API routes co-located with frontend code

Known Limitations

Single User: No authentication or concurrent access support
No Undo: Can only delete annotations, not undo placement
Memory: Large CSV files (100k+ rows) may cause slow uploads
Line Snapping: Lines don't snap to candles, free-form placement only

Troubleshooting

See DEPLOYMENT.md for detailed troubleshooting steps.

Common issues:

PostgreSQL connection errors: Check DATABASE_URL environment variable and verify PostgreSQL is running
Port 3000 in use: Use PORT=3001 npm run dev
Migration errors: Ensure PostgreSQL is accessible before starting the application

License

ISC

Contributing

This is a focused tool for a specific use case. For questions or issues, please open a GitHub issue.