# Candle Annotator A web-based tool for manually annotating candlestick charts with pattern labels and trend lines. Built for creating labeled training data for machine learning models in trading analysis. ## Overview Candle Annotator is a complete machine learning platform for candlestick pattern recognition, combining: **Annotation Tools** - TradingView-like charting interface for creating labeled training data: - Upload historical OHLC (Open, High, Low, Close) candle data from CSV files - Visualize candlestick charts with interactive zoom and pan - Annotate patterns with span labels (e.g., "Bullish Engulfing", "Doji", "Hammer") - Mark breakout patterns (Break Up, Break Down) directly on candles - Draw custom trend lines with two-click interaction - Export annotations for ML training **ML Pipeline** - Python-based training and inference system: - Feature engineering with TA-Lib indicators (RSI, MACD, Bollinger Bands, etc.) - Automated pattern detection using TA-Lib CDL* functions - Train RandomForest and XGBoost models with MLflow experiment tracking - FastAPI inference service for real-time predictions - Integration with Next.js UI for prediction visualization **Active Learning Loop** - Close the feedback cycle: - Model predictions displayed as overlays on the chart - Disagreement detection between human annotations and model predictions - One-click feedback to confirm, correct, or dismiss predictions as new training data - Continuous improvement through iterative annotation and retraining ## Features ### Data Management - **CSV Upload**: Import OHLC data with support for both Unix timestamps and date strings - **Replace Mode**: Uploading a new CSV deletes all old candles and replaces them with new data - **Initial Data**: Docker containers automatically load EURUSD.csv on first startup if database is empty - **SQLite Storage**: All candle data and annotations stored locally in SQLite database - **Data Persistence**: Annotations and candles persist between sessions ### Chart Visualization - **Interactive Candlestick Chart**: Powered by lightweight-charts library - **Dark Theme**: Eye-friendly slate color scheme - **Zoom & Pan**: Mouse wheel zoom and drag-to-pan functionality - **Crosshair**: Precise price and time tracking ### Annotation Tools - **Break Up Markers**: Green arrow markers below candles indicating upward breakouts - **Break Down Markers**: Red arrow markers above candles indicating downward breakouts - **Trend Lines**: Two-click line drawing with real-time preview - **Delete Tool**: Remove any annotation (markers or lines) by clicking on them - **Tool Toggle**: Click tool button again to deactivate ### Label Management - **Label Sidebar**: View all annotations in collapsible sidebar with: - **Click Selection**: Click markers on chart or in sidebar to select/highlight - **Keyboard Delete**: Press Delete or Backspace to remove selected label - **Individual Delete**: Delete button on each list item - **Search**: Search annotations by timestamp - **Filter**: Filter by Break Up, Break Down, or All types - **Count Display**: See how many Break Up vs Break Down markers exist - **Visual Highlight**: Selected markers highlighted with glow effect ### UI Theme - **Hacker Theme**: Terminal-inspired dark aesthetic with: - Matrix green (#00ff41) on dark background (#0a0e0a) - Monospace font (JetBrains Mono) throughout - Glow effects on button hover and active states - Custom scrollbars styled to match theme - High contrast for accessibility ### Export & Deployment - **CSV Export**: Download all annotations with timestamp, label type, and price data - **ML-Ready Format**: Structured data suitable for training ML models - **Docker Deployment**: One-command deployment with persistent data volume - **Health Check**: Built-in /api/health endpoint for monitoring ### ML Pipeline Features (Optional) The integrated ML pipeline provides: #### Feature Engineering - **TA-Lib Indicators**: Automatic computation of 150+ technical indicators (RSI, MACD, Bollinger Bands, ATR, Stochastic, etc.) - **Candle Features**: Body size, wick ratios, gap detection, price ranges - **Custom Features**: Plugin system for domain-specific feature functions - **NaN Handling**: Automatic warmup period detection and cleanup #### Annotation Ingestion - **Windowed Classification**: Extract fixed-size windows around each pattern for classification models - **BIO Sequence Labeling**: Begin-Inside-Outside encoding for sequence models (future LSTM/GRU support) - **Programmatic Labels**: TA-Lib CDL* pattern functions for auto-labeling (23+ candlestick patterns) - **Label Merging**: Human-priority, programmatic-priority, or both strategies - **Dataset Statistics**: Class distribution, label counts, human/programmatic agreement metrics #### Model Training - **Model Types**: RandomForest and XGBoost with class balancing - **Temporal Splitting**: Train/val/test splits that respect time series order (no data leakage) - **MLflow Integration**: Automatic experiment tracking, hyperparameter logging, artifact storage - **Model Registry**: Versioned model storage with stage management (Production, Staging, Archived) - **Evaluation Metrics**: Accuracy, F1 (macro/weighted), per-class precision/recall/F1 - **Visualization**: Confusion matrix, feature importance plots, classification reports #### Inference Service - **FastAPI REST API**: High-performance inference with automatic OpenAPI docs - **Preprocessing Parity**: Loads pipeline config from MLflow to ensure training/inference consistency - **Batch Processing**: Efficient prediction for large time ranges - **Span Grouping**: Consecutive predictions merged into labeled spans with confidence scores - **Model Metadata**: Endpoint to query model version, metrics, and label configuration #### Prediction UI - **Chart Overlay**: Predictions rendered as histogram series with label-specific colors - **Confidence Filtering**: Slider to hide low-confidence predictions - **Label Filtering**: Toggle visibility per pattern type with per-class F1 scores - **Disagreement Detection**: Automatic comparison of human vs model predictions - **Prediction Summary**: Counts for total predictions, agreements, disagreements - **Active Learning Feedback**: Click predictions to convert them to annotations (future feature) ## Architecture ### System Components ``` ┌─────────────────────────────────────────────────────────────────┐ │ Web Browser │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ Next.js Frontend (React 19, Tailwind, lightweight-charts)│ │ │ │ - Annotation tools │ │ │ │ - Prediction visualization │ │ │ └──────────────────┬──────────────────────────────────────────┘ │ └─────────────────────┼──────────────────────────────────────────────┘ │ HTTP ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Next.js API Routes (TypeScript) │ │ - /api/candles, /api/annotations, /api/span-annotations │ │ - /api/predict (proxy) │ │ - /api/model/info (proxy) │ │ └───────────┬─────────────────────────────────────┬─────────── │ │ │ SQLite (annotations) │ HTTP │ │ ▼ ▼ │ │ ┌──────────────────┐ ┌──────────────────────┐ │ │ │ SQLite Database │ │ ML Inference API │ │ │ │ (Drizzle ORM) │ │ (FastAPI, Python) │ │ │ └──────────────────┘ └──────────┬───────────┘ │ └────────────────────────────────────────────────────┼──────────────┘ │ ┌──────────────────────────────┴───────────────┐ │ │ ┌───────────▼─────────┐ ┌──────────────▼────────┐ │ MLflow Server │ │ PostgreSQL │ │ (Experiments, │ │ (Training run │ │ Model Registry) │ │ metadata) │ └─────────────────────┘ └───────────────────────┘ ``` ### ML Pipeline Workflow ``` 1. Annotate Data (Web UI) ↓ 2. Export Annotations (JSON) ↓ 3. Feature Engineering (TA-Lib) ├─ Raw OHLCV → Enriched CSV (with indicators) ↓ 4. Annotation Ingestion ├─ Annotations + Enriched CSV → Labeled Dataset ├─ Optional: TA-Lib CDL* auto-labeling ↓ 5. Model Training ├─ Temporal train/val/test split ├─ RandomForest or XGBoost training ├─ MLflow experiment tracking ├─ Model registration ↓ 6. Inference Service ├─ Load model from MLflow registry ├─ Serve predictions via FastAPI ↓ 7. Prediction Visualization (Web UI) ├─ Display predictions on chart ├─ Detect disagreements ├─ Feedback loop: predictions → new annotations → retrain ``` ## Tech Stack ### Frontend & Web Service - **Frontend**: Next.js 16 (App Router), React 19, TypeScript - **Styling**: Tailwind CSS 3, shadcn/ui components - **Charting**: lightweight-charts 4.x (TradingView) - **Icons**: lucide-react - **Backend**: Next.js API Routes - **Database**: SQLite with better-sqlite3 - **ORM**: Drizzle ORM - **CSV Parsing**: papaparse ### ML Pipeline (Python) - **API Framework**: FastAPI with uvicorn - **ML Libraries**: scikit-learn (RandomForest), XGBoost - **Feature Engineering**: TA-Lib (Technical Analysis Library) - **Data Processing**: pandas, numpy - **Experiment Tracking**: MLflow (model registry, artifact storage) - **Data Versioning**: DVC (Data Version Control) - **Database**: PostgreSQL 16 (training run metadata) - **Model Persistence**: joblib - **Validation**: Pydantic ## Getting Started ### Docker Quickstart (Recommended) The fastest way to get running with Docker: ```bash docker-compose up --build ``` Then open http://localhost:3000 See [DEPLOYMENT.md](./DEPLOYMENT.md#docker-deployment) for detailed Docker instructions. ### Prerequisites - Node.js 18.x or higher (for local development) - npm 9.x or higher (for local development) - Docker & docker-compose (for containerized deployment) - Build tools for native modules (see DEPLOYMENT.md) ### Local Development Installation 1. Clone the repository: ```bash git clone cd candle_annotator ``` 2. Install dependencies: ```bash npm install ``` 3. Start the development server: ```bash npm run dev ``` 4. Open http://localhost:3000 in your browser ### Usage 1. **Upload Data**: Click "Choose CSV File" and select a CSV with columns: `time,open,high,low,close` 2. **View Chart**: The candlestick chart renders automatically after upload 3. **Add Annotations**: - Click "Label: Break Up" or "Label: Break Down" then click on a candle - Click "Draw Line" then click two points to draw a trend line - Press Escape to cancel line drawing 4. **Delete Annotations**: Click "Delete" tool, then click on markers or lines to remove them 5. **Export**: Click "Export CSV" to download all annotations ## CSV File Format ### Input Format Your CSV file should have these columns: ```csv time,open,high,low,close 1700000000,1.0500,1.0520,1.0490,1.0510 1700000060,1.0510,1.0530,1.0505,1.0525 ``` **Time column** accepts: - Unix timestamps (seconds): `1700000000` - Date strings: `2024-01-15`, `2024-01-15 10:30:00` ### Export Format The exported CSV includes: ```csv timestamp,label_type,price 1700000000,break_up,1.0510 1700000120,break_down,1.0505 1700000000,line,1.0500 ``` - **timestamp**: Unix timestamp of the annotation - **label_type**: `break_up`, `break_down`, or `line` - **price**: Close price for markers, start price for lines ## Database Schema ### Candles Table ```typescript { id: integer (PK, auto-increment), time: integer (Unix timestamp, unique), open: real, high: real, low: real, close: real } ``` ### Annotations Table ```typescript { id: integer (PK, auto-increment), timestamp: integer (Unix timestamp), label_type: text ('break_up' | 'break_down' | 'line'), geometry: text (JSON string for line coordinates, null for markers), created_at: integer (Unix timestamp) } ``` ## API Endpoints ### POST /api/upload Upload CSV file and store candle data **Behavior**: Deletes all existing candles before inserting new data (replace mode) **Request**: multipart/form-data with `file` field **Response**: `{ success: true, count: number }` or `{ error: string }` ### GET /api/candles Retrieve all candle records **Response**: Array of candle objects ordered by time ### GET /api/annotations Retrieve all annotations **Response**: Array of annotation objects with parsed geometry ### POST /api/annotations Create a new annotation **Request**: `{ timestamp: number, label_type: string, geometry?: object }` **Response**: Created annotation object with ID ### DELETE /api/annotations/[id] Delete an annotation by ID **Response**: `{ success: true }` or `{ error: string }` ### GET /api/export Export annotations as downloadable CSV **Response**: CSV file download with Content-Disposition header ## Architecture ### Component Structure - **page.tsx**: Main page composition, manages active tool state - **Toolbox.tsx**: Sidebar with tool buttons and export functionality - **FileUpload.tsx**: CSV upload component with status messages - **CandleChart.tsx**: Core chart wrapper with lightweight-charts integration - Initializes chart with dark theme - Handles marker annotations (Break Up/Down) - Manages click events for annotation creation - Exposes `refreshData()` method for parent updates - **SvgOverlay.tsx**: Transparent SVG layer for line drawing - Coordinate transformation between data and pixels - Two-click line drawing with preview - Line hit detection for deletion ### Data Flow 1. User uploads CSV → POST /api/upload → SQLite storage 2. Chart mounts → GET /api/candles + GET /api/annotations → Render 3. User clicks with active tool → POST /api/annotations → Refresh chart 4. User deletes → DELETE /api/annotations/[id] → Refresh chart 5. User exports → GET /api/export → CSV download ## Development ### Project Structure ``` candle_annotator/ ├── src/ │ ├── app/ │ │ ├── api/ # API route handlers │ │ │ ├── upload/ │ │ │ ├── candles/ │ │ │ ├── annotations/ │ │ │ └── export/ │ │ ├── globals.css # Tailwind styles │ │ ├── layout.tsx # Root layout with dark theme │ │ └── page.tsx # Main page │ ├── components/ │ │ ├── ui/ # shadcn/ui components │ │ ├── CandleChart.tsx │ │ ├── SvgOverlay.tsx │ │ ├── Toolbox.tsx │ │ └── FileUpload.tsx │ └── lib/ │ ├── db/ │ │ ├── index.ts # Drizzle client │ │ ├── schema.ts # Table definitions │ │ └── migrate.ts # Migration runner │ └── utils.ts # Utility functions ├── data/ # SQLite database directory ├── drizzle/ # Migration files ├── DEPLOYMENT.md # Deployment instructions └── README.md # This file ``` ### Key Technical Decisions 1. **lightweight-charts v4**: Stable API with good candlestick and marker support 2. **SQLite with better-sqlite3**: Synchronous access, perfect for single-user local apps 3. **SVG Overlay for Lines**: Maintains separate rendering layer from chart, easier coordinate management 4. **Drizzle ORM**: Type-safe queries with minimal overhead 5. **Next.js App Router**: Server-side API routes co-located with frontend code ### Known Limitations - **Single User**: No authentication or concurrent access support - **No Undo**: Can only delete annotations, not undo placement - **Memory**: Large CSV files (100k+ rows) may cause slow uploads - **Line Snapping**: Lines don't snap to candles, free-form placement only ## Troubleshooting See [DEPLOYMENT.md](./DEPLOYMENT.md) for detailed troubleshooting steps. Common issues: - **better-sqlite3 binding errors**: Run `npm rebuild better-sqlite3` - **Port 3000 in use**: Use `PORT=3001 npm run dev` - **Database corruption**: Delete `data/candles.db` and restart ## License ISC ## Contributing This is a focused tool for a specific use case. For questions or issues, please open a GitHub issue.