- charts.name → uniqueIndex('charts_user_id_name_unique').on(user_id, name)
- annotation_types.name → uniqueIndex('annotation_types_user_id_name_unique').on(user_id, name)
- span_label_types.name → uniqueIndex('span_label_types_user_id_name_unique').on(user_id, name)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
||
|---|---|---|
| .agents | ||
| .cursor/skills | ||
| .github/workflows | ||
| .qwen/skills | ||
| drizzle | ||
| lovable_design_html | ||
| models | ||
| openspec | ||
| public | ||
| RAZNO | ||
| scripts | ||
| services/ml | ||
| src | ||
| .dockerignore | ||
| .env.example | ||
| .eslintrc.json | ||
| .gitignore | ||
| candle_annotator.db | ||
| candle_annotator@1.0.0 | ||
| CLAUDE.md | ||
| CLAUDE_DESCRIPTION.md | ||
| CODE_REVIEW.md | ||
| components.json | ||
| deploy_zero_downtime.sh | ||
| DEPLOYMENT.md | ||
| docker-compose.yml | ||
| Dockerfile | ||
| drizzle.config.ts | ||
| EURUSD.csv | ||
| inference-ui-prompt.md | ||
| ml-pipeline-prompt.md | ||
| ML_QUICKSTART.md | ||
| next-env.d.ts | ||
| next.config.js | ||
| openspec_teams_prompt.md | ||
| package-lock.json | ||
| package.json | ||
| postcss.config.js | ||
| README.md | ||
| tailwind.config.js | ||
| tailwind.config.ts | ||
| TALIB_WORKFLOW.md | ||
| tsconfig.json | ||
| tsconfig.tsbuildinfo | ||
| tsx | ||
Candle Annotator
A web-based tool for manually annotating candlestick charts with pattern labels and trend lines. Built for creating labeled training data for machine learning models in trading analysis.
Overview
Candle Annotator is a complete machine learning platform for candlestick pattern recognition, combining:
Annotation Tools - TradingView-like charting interface for creating labeled training data:
- Upload historical OHLC (Open, High, Low, Close) candle data from CSV files
- Visualize candlestick charts with interactive zoom and pan
- Annotate patterns with span labels (e.g., "Bullish Engulfing", "Doji", "Hammer")
- Mark breakout patterns (Break Up, Break Down) directly on candles
- Draw custom trend lines with two-click interaction
- Export annotations for ML training
ML Pipeline - Python-based training and inference system:
- Feature engineering with TA-Lib indicators (RSI, MACD, Bollinger Bands, etc.)
- Automated pattern detection using TA-Lib CDL* functions
- Train RandomForest and XGBoost models with MLflow experiment tracking
- FastAPI inference service for real-time predictions
- Integration with Next.js UI for prediction visualization
Active Learning Loop - Close the feedback cycle:
- Model predictions displayed as overlays on the chart
- Disagreement detection between human annotations and model predictions
- One-click feedback to confirm, correct, or dismiss predictions as new training data
- Continuous improvement through iterative annotation and retraining
Features
Data Management
- CSV Upload: Import OHLC data with support for both Unix timestamps and date strings
- Replace Mode: Uploading a new CSV deletes all old candles and replaces them with new data
- Initial Data: Docker containers automatically load EURUSD.csv on first startup if database is empty
- PostgreSQL Storage: All candle data and annotations stored in PostgreSQL database
- Shared Database: Frontend and ML service use the same database for seamless data access
- Data Persistence: Annotations and candles persist between sessions
Chart Visualization
- Interactive Candlestick Chart: Powered by lightweight-charts library
- Dark Theme: Eye-friendly slate color scheme
- Zoom & Pan: Mouse wheel zoom and drag-to-pan functionality
- Crosshair: Precise price and time tracking
Annotation Tools
- Break Up Markers: Green arrow markers below candles indicating upward breakouts
- Break Down Markers: Red arrow markers above candles indicating downward breakouts
- Trend Lines: Two-click line drawing with real-time preview
- Delete Tool: Remove any annotation (markers or lines) by clicking on them
- Tool Toggle: Click tool button again to deactivate
Label Management
- Label Sidebar: View all annotations in collapsible sidebar with:
- Click Selection: Click markers on chart or in sidebar to select/highlight
- Keyboard Delete: Press Delete or Backspace to remove selected label
- Individual Delete: Delete button on each list item
- Search: Search annotations by timestamp
- Filter: Filter by Break Up, Break Down, or All types
- Count Display: See how many Break Up vs Break Down markers exist
- Visual Highlight: Selected markers highlighted with glow effect
UI Theme
- Hacker Theme: Terminal-inspired dark aesthetic with:
- Matrix green (#00ff41) on dark background (#0a0e0a)
- Monospace font (JetBrains Mono) throughout
- Glow effects on button hover and active states
- Custom scrollbars styled to match theme
- High contrast for accessibility
Export & Deployment
- CSV Export: Download all annotations with timestamp, label type, and price data
- ML-Ready Format: Structured data suitable for training ML models
- Docker Deployment: One-command deployment with persistent data volume
- Health Check: Built-in /api/health endpoint for monitoring
ML Pipeline Features (Optional)
The integrated ML pipeline provides:
Feature Engineering
- TA-Lib Indicators: Automatic computation of 150+ technical indicators (RSI, MACD, Bollinger Bands, ATR, Stochastic, etc.)
- Candle Features: Body size, wick ratios, gap detection, price ranges
- Custom Features: Plugin system for domain-specific feature functions
- NaN Handling: Automatic warmup period detection and cleanup
Annotation Ingestion
- Windowed Classification: Extract fixed-size windows around each pattern for classification models
- BIO Sequence Labeling: Begin-Inside-Outside encoding for sequence models (future LSTM/GRU support)
- Programmatic Labels: TA-Lib CDL* pattern functions for auto-labeling (23+ candlestick patterns)
- Label Merging: Human-priority, programmatic-priority, or both strategies
- Dataset Statistics: Class distribution, label counts, human/programmatic agreement metrics
Model Training
- Model Types: RandomForest and XGBoost with class balancing
- Temporal Splitting: Train/val/test splits that respect time series order (no data leakage)
- MLflow Integration: Automatic experiment tracking, hyperparameter logging, artifact storage
- Model Registry: Versioned model storage with stage management (Production, Staging, Archived)
- Evaluation Metrics: Accuracy, F1 (macro/weighted), per-class precision/recall/F1
- Visualization: Confusion matrix, feature importance plots, classification reports
Inference Service
- FastAPI REST API: High-performance inference with automatic OpenAPI docs
- Preprocessing Parity: Loads pipeline config from MLflow to ensure training/inference consistency
- Batch Processing: Efficient prediction for large time ranges
- Span Grouping: Consecutive predictions merged into labeled spans with confidence scores
- Model Metadata: Endpoint to query model version, metrics, and label configuration
Prediction UI
- Chart Overlay: Predictions rendered as histogram series with label-specific colors
- Confidence Filtering: Slider to hide low-confidence predictions
- Label Filtering: Toggle visibility per pattern type with per-class F1 scores
- Disagreement Detection: Automatic comparison of human vs model predictions
- Prediction Summary: Counts for total predictions, agreements, disagreements
- Active Learning Feedback: Click predictions to convert them to annotations (future feature)
Architecture
System Components
┌─────────────────────────────────────────────────────────────────┐
│ Web Browser │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Next.js Frontend (React 19, Tailwind, lightweight-charts)│ │
│ │ - Annotation tools │ │
│ │ - Prediction visualization │ │
│ └──────────────────┬──────────────────────────────────────────┘ │
└─────────────────────┼──────────────────────────────────────────────┘
│ HTTP
▼
┌─────────────────────────────────────────────────────────────────┐
│ Next.js API Routes (TypeScript) │
│ - /api/candles, /api/annotations, /api/span-annotations │
│ - /api/predict (proxy) │
│ - /api/model/info (proxy) │
│ └───────────┬─────────────────────────────────────┬─────────── │
│ │ PostgreSQL │ HTTP │
│ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ PostgreSQL Database (Shared) │ │
│ │ - Frontend tables (candles, annotations, span_annotations) │
│ │ - ML tables (training_runs) │ │
│ │ - Accessed by: Next.js (Drizzle ORM) │ │
│ │ ML Service (SQLAlchemy) │ │
│ └──────────────────────┬──────────────────────────────────┘ │
│ │ │
│ ┌──────────┴──────────┐ │
│ │ │ │
│ ┌───────────▼─────────┐ ┌───────▼───────────────────┐ │
│ │ ML Inference API │ │ MLflow Server │ │
│ │ (FastAPI, Python) │ │ (Experiments, Registry) │ │
│ └─────────────────────┘ └───────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
ML Pipeline Workflow
1. Annotate Data (Web UI)
↓
2. Export Annotations (JSON)
↓
3. Feature Engineering (TA-Lib)
├─ Raw OHLCV → Enriched CSV (with indicators)
↓
4. Annotation Ingestion
├─ Annotations + Enriched CSV → Labeled Dataset
├─ Optional: TA-Lib CDL* auto-labeling
↓
5. Model Training
├─ Temporal train/val/test split
├─ RandomForest or XGBoost training
├─ MLflow experiment tracking
├─ Model registration
↓
6. Inference Service
├─ Load model from MLflow registry
├─ Serve predictions via FastAPI
↓
7. Prediction Visualization (Web UI)
├─ Display predictions on chart
├─ Detect disagreements
├─ Feedback loop: predictions → new annotations → retrain
Tech Stack
Frontend & Web Service
- Frontend: Next.js 16 (App Router), React 19, TypeScript
- Styling: Tailwind CSS 3, shadcn/ui components
- Charting: lightweight-charts 4.x (TradingView)
- Icons: lucide-react
- Backend: Next.js API Routes
- Database: PostgreSQL 16 with pg driver
- ORM: Drizzle ORM (PostgreSQL dialect)
- CSV Parsing: papaparse
ML Pipeline (Python)
- API Framework: FastAPI with uvicorn
- ML Libraries: scikit-learn (RandomForest), XGBoost
- Feature Engineering: TA-Lib (Technical Analysis Library)
- Data Processing: pandas, numpy
- Experiment Tracking: MLflow (model registry, artifact storage)
- Data Versioning: DVC (Data Version Control)
- Database: PostgreSQL 16 (shared with frontend - reads candles/annotations, writes training runs)
- ORM: SQLAlchemy (for training runs) + table reflection (for frontend data)
- Model Persistence: joblib
- Validation: Pydantic
Getting Started
Docker Quickstart (Recommended)
The fastest way to get running with Docker:
docker-compose up --build
Then open http://localhost:3000
See DEPLOYMENT.md for detailed Docker instructions.
Prerequisites
- Node.js 18.x or higher (for local development)
- npm 9.x or higher (for local development)
- PostgreSQL 16 or higher (for local development)
- Docker & docker-compose (for containerized deployment)
Local Development Installation
-
Clone the repository:
git clone <repository-url> cd candle_annotator -
Install dependencies:
npm install -
Setup PostgreSQL database:
createdb candle_annotator createuser -P ml_user # Enter password: ml_password psql -c "GRANT ALL PRIVILEGES ON DATABASE candle_annotator TO ml_user;" -
Create
.envfile:cp .env.example .env # Edit .env to set DATABASE_URL=postgresql://ml_user:ml_password@localhost:5432/candle_annotator -
Start the development server:
npm run dev -
Open http://localhost:3000 in your browser
Usage
- Upload Data: Click "Choose CSV File" and select a CSV with columns:
time,open,high,low,close - View Chart: The candlestick chart renders automatically after upload
- Add Annotations:
- Click "Label: Break Up" or "Label: Break Down" then click on a candle
- Click "Draw Line" then click two points to draw a trend line
- Press Escape to cancel line drawing
- Delete Annotations: Click "Delete" tool, then click on markers or lines to remove them
- Export: Click "Export CSV" to download all annotations
CSV File Format
Input Format
Your CSV file should have these columns:
time,open,high,low,close
1700000000,1.0500,1.0520,1.0490,1.0510
1700000060,1.0510,1.0530,1.0505,1.0525
Time column accepts:
- Unix timestamps (seconds):
1700000000 - Date strings:
2024-01-15,2024-01-15 10:30:00
Export Format
The exported CSV includes:
timestamp,label_type,price
1700000000,break_up,1.0510
1700000120,break_down,1.0505
1700000000,line,1.0500
- timestamp: Unix timestamp of the annotation
- label_type:
break_up,break_down, orline - price: Close price for markers, start price for lines
Database Schema
Candles Table (PostgreSQL)
{
id: serial (PK, auto-increment),
chart_id: integer (FK to charts.id),
time: timestamp (not null, indexed with chart_id),
open: double precision,
high: double precision,
low: double precision,
close: double precision
}
Annotations Table (Point Annotations)
{
id: serial (PK, auto-increment),
chart_id: integer (FK to charts.id),
timestamp: timestamp (not null),
label_type: text ('line' | 'rectangle'),
geometry: jsonb (for line/rectangle coordinates, nullable),
color: text (default '#3b82f6'),
created_at: timestamp (default now())
}
Span Annotations Table (Pattern Labels)
{
id: serial (PK, auto-increment),
chart_id: integer (FK to charts.id),
start_time: timestamp (not null),
end_time: timestamp (not null),
label: text (pattern name, e.g., 'Bullish Engulfing'),
confidence: integer (nullable),
outcome: text (nullable),
notes: text (nullable),
sub_spans: jsonb (nullable),
color: text (default '#2196F3'),
source: text (default 'human'), # 'human' | 'model' | 'hybrid'
model_prediction: jsonb (nullable),
created_at: timestamp (default now())
}
Training Runs Table (ML Service)
{
id: serial (PK, auto-increment),
run_id: text (unique, MLflow run ID),
model_type: text (e.g., 'RandomForest', 'XGBoost'),
experiment_name: text,
pipeline_config_hash: text,
dataset_version: text,
metrics_summary: jsonb,
status: text (e.g., 'running', 'completed', 'failed'),
created_at: timestamp (default now()),
completed_at: timestamp (nullable)
}
API Endpoints
POST /api/upload
Upload CSV file and store candle data
Behavior: Deletes all existing candles before inserting new data (replace mode)
Request: multipart/form-data with file field
Response: { success: true, count: number } or { error: string }
GET /api/candles
Retrieve all candle records
Response: Array of candle objects ordered by time
GET /api/annotations
Retrieve all annotations
Response: Array of annotation objects with parsed geometry
POST /api/annotations
Create a new annotation
Request: { timestamp: number, label_type: string, geometry?: object }
Response: Created annotation object with ID
DELETE /api/annotations/[id]
Delete an annotation by ID
Response: { success: true } or { error: string }
GET /api/export
Export annotations as downloadable CSV
Response: CSV file download with Content-Disposition header
Architecture
Component Structure
- page.tsx: Main page composition, manages active tool state
- Toolbox.tsx: Sidebar with tool buttons and export functionality
- FileUpload.tsx: CSV upload component with status messages
- CandleChart.tsx: Core chart wrapper with lightweight-charts integration
- Initializes chart with dark theme
- Handles marker annotations (Break Up/Down)
- Manages click events for annotation creation
- Exposes
refreshData()method for parent updates
- SvgOverlay.tsx: Transparent SVG layer for line drawing
- Coordinate transformation between data and pixels
- Two-click line drawing with preview
- Line hit detection for deletion
Data Flow
- User uploads CSV → POST /api/upload → SQLite storage
- Chart mounts → GET /api/candles + GET /api/annotations → Render
- User clicks with active tool → POST /api/annotations → Refresh chart
- User deletes → DELETE /api/annotations/[id] → Refresh chart
- User exports → GET /api/export → CSV download
Development
Project Structure
candle_annotator/
├── src/
│ ├── app/
│ │ ├── api/ # API route handlers
│ │ │ ├── upload/
│ │ │ ├── candles/
│ │ │ ├── annotations/
│ │ │ └── export/
│ │ ├── globals.css # Tailwind styles
│ │ ├── layout.tsx # Root layout with dark theme
│ │ └── page.tsx # Main page
│ ├── components/
│ │ ├── ui/ # shadcn/ui components
│ │ ├── CandleChart.tsx
│ │ ├── SvgOverlay.tsx
│ │ ├── Toolbox.tsx
│ │ └── FileUpload.tsx
│ └── lib/
│ ├── db/
│ │ ├── index.ts # Drizzle client
│ │ ├── schema.ts # Table definitions
│ │ └── migrate.ts # Migration runner
│ └── utils.ts # Utility functions
├── data/ # SQLite database directory
├── drizzle/ # Migration files
├── DEPLOYMENT.md # Deployment instructions
└── README.md # This file
Key Technical Decisions
- lightweight-charts v4: Stable API with good candlestick and marker support
- PostgreSQL: Shared database enables ML service to directly query candle/annotation data without CSV exports
- SVG Overlay for Lines: Maintains separate rendering layer from chart, easier coordinate management
- Drizzle ORM: Type-safe queries with minimal overhead, PostgreSQL dialect for production-grade features
- Next.js App Router: Server-side API routes co-located with frontend code
Known Limitations
- Single User: No authentication or concurrent access support
- No Undo: Can only delete annotations, not undo placement
- Memory: Large CSV files (100k+ rows) may cause slow uploads
- Line Snapping: Lines don't snap to candles, free-form placement only
Troubleshooting
See DEPLOYMENT.md for detailed troubleshooting steps.
Common issues:
- PostgreSQL connection errors: Check
DATABASE_URLenvironment variable and verify PostgreSQL is running - Port 3000 in use: Use
PORT=3001 npm run dev - Migration errors: Ensure PostgreSQL is accessible before starting the application
License
ISC
Contributing
This is a focused tool for a specific use case. For questions or issues, please open a GitHub issue.