# Candle Annotator A web-based tool for manually annotating candlestick charts with pattern labels and trend lines. Built for creating labeled training data for machine learning models in trading analysis. ## Overview Candle Annotator is a complete machine learning platform for candlestick pattern recognition, combining: **Annotation Tools** - TradingView-like charting interface for creating labeled training data: - Upload historical OHLC (Open, High, Low, Close) candle data from CSV files - Visualize candlestick charts with interactive zoom and pan - Annotate patterns with span labels (e.g., "Bullish Engulfing", "Doji", "Hammer") - Mark breakout patterns (Break Up, Break Down) directly on candles - Draw custom trend lines with two-click interaction - Export annotations for ML training **ML Pipeline** - Python-based training and inference system: - Feature engineering with TA-Lib indicators (RSI, MACD, Bollinger Bands, etc.) - Automated pattern detection using TA-Lib CDL* functions - Train RandomForest and XGBoost models with MLflow experiment tracking - FastAPI inference service for real-time predictions - Integration with Next.js UI for prediction visualization **Active Learning Loop** - Close the feedback cycle: - Model predictions displayed as overlays on the chart - Disagreement detection between human annotations and model predictions - One-click feedback to confirm, correct, or dismiss predictions as new training data - Continuous improvement through iterative annotation and retraining ## Features ### User Accounts & Authentication - **Multi-User Support**: Each user has isolated data with per-user workspace - **Auth.js v5 Integration**: Flexible authentication system supporting multiple sign-in methods - **Sign-In Methods**: - **Credentials (Email/Password)**: Traditional email and password authentication with bcryptjs hashing - **Google OAuth**: One-click sign-in with Google accounts - **Registration**: Self-service account creation with email validation and password requirements (minimum 8 characters) - **Settings Page**: Update display name, change password for credential users, or delete account with confirmation - **Default Admin Account**: Database seeding with default admin credentials for initial setup - **Per-User Data Isolation**: All charts, annotations, and ML models are scoped to individual users ### Data Management - **CSV Upload**: Import OHLC data with support for both Unix timestamps and date strings - **Replace Mode**: Uploading a new CSV deletes all old candles and replaces them with new data - **Initial Data**: Docker containers automatically load EURUSD.csv on first startup if database is empty - **PostgreSQL Storage**: All candle data and annotations stored in PostgreSQL database - **Shared Database**: Frontend and ML service use the same database for seamless data access - **Data Persistence**: Annotations and candles persist between sessions ### Chart Visualization - **Interactive Candlestick Chart**: Powered by lightweight-charts library - **Dark Theme**: Eye-friendly slate color scheme - **Zoom & Pan**: Mouse wheel zoom and drag-to-pan functionality - **Crosshair**: Precise price and time tracking ### Annotation Tools - **Break Up Markers**: Green arrow markers below candles indicating upward breakouts - **Break Down Markers**: Red arrow markers above candles indicating downward breakouts - **Trend Lines**: Two-click line drawing with real-time preview - **Delete Tool**: Remove any annotation (markers or lines) by clicking on them - **Tool Toggle**: Click tool button again to deactivate ### Label Management - **Label Sidebar**: View all annotations in collapsible sidebar with: - **Click Selection**: Click markers on chart or in sidebar to select/highlight - **Keyboard Delete**: Press Delete or Backspace to remove selected label - **Individual Delete**: Delete button on each list item - **Search**: Search annotations by timestamp - **Filter**: Filter by Break Up, Break Down, or All types - **Count Display**: See how many Break Up vs Break Down markers exist - **Visual Highlight**: Selected markers highlighted with glow effect ### UI Theme - **Hacker Theme**: Terminal-inspired dark aesthetic with: - Matrix green (#00ff41) on dark background (#0a0e0a) - Monospace font (JetBrains Mono) throughout - Glow effects on button hover and active states - Custom scrollbars styled to match theme - High contrast for accessibility ### Export & Deployment - **CSV Export**: Download all annotations with timestamp, label type, and price data - **ML-Ready Format**: Structured data suitable for training ML models - **Docker Deployment**: One-command deployment with persistent data volume - **Health Check**: Built-in /api/health endpoint for monitoring ### ML Pipeline Features (Optional) The integrated ML pipeline provides: #### Feature Engineering - **TA-Lib Indicators**: Automatic computation of 150+ technical indicators (RSI, MACD, Bollinger Bands, ATR, Stochastic, etc.) - **Candle Features**: Body size, wick ratios, gap detection, price ranges - **Custom Features**: Plugin system for domain-specific feature functions - **NaN Handling**: Automatic warmup period detection and cleanup #### Annotation Ingestion - **Windowed Classification**: Extract fixed-size windows around each pattern for classification models - **BIO Sequence Labeling**: Begin-Inside-Outside encoding for sequence models (future LSTM/GRU support) - **Programmatic Labels**: TA-Lib CDL* pattern functions for auto-labeling (23+ candlestick patterns) - **Label Merging**: Human-priority, programmatic-priority, or both strategies - **Dataset Statistics**: Class distribution, label counts, human/programmatic agreement metrics #### Model Training - **Model Types**: RandomForest and XGBoost with class balancing - **Temporal Splitting**: Train/val/test splits that respect time series order (no data leakage) - **MLflow Integration**: Automatic experiment tracking, hyperparameter logging, artifact storage - **Model Registry**: Versioned model storage with stage management (Production, Staging, Archived) - **Evaluation Metrics**: Accuracy, F1 (macro/weighted), per-class precision/recall/F1 - **Visualization**: Confusion matrix, feature importance plots, classification reports #### Inference Service - **FastAPI REST API**: High-performance inference with automatic OpenAPI docs - **Preprocessing Parity**: Loads pipeline config from MLflow to ensure training/inference consistency - **Batch Processing**: Efficient prediction for large time ranges - **Span Grouping**: Consecutive predictions merged into labeled spans with confidence scores - **Model Metadata**: Endpoint to query model version, metrics, and label configuration #### Prediction UI - **Chart Overlay**: Predictions rendered as histogram series with label-specific colors - **Confidence Filtering**: Slider to hide low-confidence predictions - **Label Filtering**: Toggle visibility per pattern type with per-class F1 scores - **Disagreement Detection**: Automatic comparison of human vs model predictions - **Prediction Summary**: Counts for total predictions, agreements, disagreements - **Active Learning Feedback**: Click predictions to convert them to annotations (future feature) ## Architecture ### System Components ``` ┌─────────────────────────────────────────────────────────────────┐ │ Web Browser │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ Next.js Frontend (React 19, Tailwind, lightweight-charts)│ │ │ │ - Annotation tools │ │ │ │ - Prediction visualization │ │ │ └──────────────────┬──────────────────────────────────────────┘ │ └─────────────────────┼──────────────────────────────────────────────┘ │ HTTP ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Next.js API Routes (TypeScript) │ │ - /api/candles, /api/annotations, /api/span-annotations │ │ - /api/predict (proxy) │ │ - /api/model/info (proxy) │ │ └───────────┬─────────────────────────────────────┬─────────── │ │ │ PostgreSQL │ HTTP │ │ ▼ ▼ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ PostgreSQL Database (Shared) │ │ │ │ - Frontend tables (candles, annotations, span_annotations) │ │ │ - ML tables (training_runs) │ │ │ │ - Accessed by: Next.js (Drizzle ORM) │ │ │ │ ML Service (SQLAlchemy) │ │ │ └──────────────────────┬──────────────────────────────────┘ │ │ │ │ │ ┌──────────┴──────────┐ │ │ │ │ │ │ ┌───────────▼─────────┐ ┌───────▼───────────────────┐ │ │ │ ML Inference API │ │ MLflow Server │ │ │ │ (FastAPI, Python) │ │ (Experiments, Registry) │ │ │ └─────────────────────┘ └───────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` ### ML Pipeline Workflow ``` 1. Annotate Data (Web UI) ↓ 2. Export Annotations (JSON) ↓ 3. Feature Engineering (TA-Lib) ├─ Raw OHLCV → Enriched CSV (with indicators) ↓ 4. Annotation Ingestion ├─ Annotations + Enriched CSV → Labeled Dataset ├─ Optional: TA-Lib CDL* auto-labeling ↓ 5. Model Training ├─ Temporal train/val/test split ├─ RandomForest or XGBoost training ├─ MLflow experiment tracking ├─ Model registration ↓ 6. Inference Service ├─ Load model from MLflow registry ├─ Serve predictions via FastAPI ↓ 7. Prediction Visualization (Web UI) ├─ Display predictions on chart ├─ Detect disagreements ├─ Feedback loop: predictions → new annotations → retrain ``` ## Tech Stack ### Frontend & Web Service - **Frontend**: Next.js 16 (App Router), React 19, TypeScript - **Styling**: Tailwind CSS 3, shadcn/ui components - **Charting**: lightweight-charts 4.x (TradingView) - **Icons**: lucide-react - **Backend**: Next.js API Routes - **Database**: PostgreSQL 16 with pg driver - **ORM**: Drizzle ORM (PostgreSQL dialect) - **CSV Parsing**: papaparse ### ML Pipeline (Python) - **API Framework**: FastAPI with uvicorn - **ML Libraries**: scikit-learn (RandomForest), XGBoost - **Feature Engineering**: TA-Lib (Technical Analysis Library) - **Data Processing**: pandas, numpy - **Experiment Tracking**: MLflow (model registry, artifact storage) - **Data Versioning**: DVC (Data Version Control) - **Database**: PostgreSQL 16 (shared with frontend - reads candles/annotations, writes training runs) - **ORM**: SQLAlchemy (for training runs) + table reflection (for frontend data) - **Model Persistence**: joblib - **Validation**: Pydantic ## Getting Started ### Docker Quickstart (Recommended) The fastest way to get running with Docker: ```bash docker-compose up --build ``` Then open http://localhost:3000 See [DEPLOYMENT.md](./DEPLOYMENT.md#docker-deployment) for detailed Docker instructions. ### Prerequisites - Node.js 18.x or higher (for local development) - npm 9.x or higher (for local development) - PostgreSQL 16 or higher (for local development) - Docker & docker-compose (for containerized deployment) ### Local Development Installation 1. Clone the repository: ```bash git clone cd candle_annotator ``` 2. Install dependencies: ```bash npm install ``` 3. Setup PostgreSQL database: ```bash createdb candle_annotator createuser -P ml_user # Enter password: ml_password psql -c "GRANT ALL PRIVILEGES ON DATABASE candle_annotator TO ml_user;" ``` 4. Create `.env` file: ```bash cp .env.example .env # Edit .env to set DATABASE_URL=postgresql://ml_user:ml_password@localhost:5432/candle_annotator ``` 5. Start the development server: ```bash npm run dev ``` 6. Open http://localhost:3000 in your browser ### Usage 1. **Upload Data**: Click "Choose CSV File" and select a CSV with columns: `time,open,high,low,close` 2. **View Chart**: The candlestick chart renders automatically after upload 3. **Add Annotations**: - Click "Label: Break Up" or "Label: Break Down" then click on a candle - Click "Draw Line" then click two points to draw a trend line - Press Escape to cancel line drawing 4. **Delete Annotations**: Click "Delete" tool, then click on markers or lines to remove them 5. **Export**: Click "Export CSV" to download all annotations ## CSV File Format ### Input Format Your CSV file should have these columns: ```csv time,open,high,low,close 1700000000,1.0500,1.0520,1.0490,1.0510 1700000060,1.0510,1.0530,1.0505,1.0525 ``` **Time column** accepts: - Unix timestamps (seconds): `1700000000` - Date strings: `2024-01-15`, `2024-01-15 10:30:00` ### Export Format The exported CSV includes: ```csv timestamp,label_type,price 1700000000,break_up,1.0510 1700000120,break_down,1.0505 1700000000,line,1.0500 ``` - **timestamp**: Unix timestamp of the annotation - **label_type**: `break_up`, `break_down`, or `line` - **price**: Close price for markers, start price for lines ## Database Schema ### Candles Table (PostgreSQL) ```typescript { id: serial (PK, auto-increment), chart_id: integer (FK to charts.id), time: timestamp (not null, indexed with chart_id), open: double precision, high: double precision, low: double precision, close: double precision } ``` ### Annotations Table (Point Annotations) ```typescript { id: serial (PK, auto-increment), chart_id: integer (FK to charts.id), timestamp: timestamp (not null), label_type: text ('line' | 'rectangle'), geometry: jsonb (for line/rectangle coordinates, nullable), color: text (default '#3b82f6'), created_at: timestamp (default now()) } ``` ### Span Annotations Table (Pattern Labels) ```typescript { id: serial (PK, auto-increment), chart_id: integer (FK to charts.id), start_time: timestamp (not null), end_time: timestamp (not null), label: text (pattern name, e.g., 'Bullish Engulfing'), confidence: integer (nullable), outcome: text (nullable), notes: text (nullable), sub_spans: jsonb (nullable), color: text (default '#2196F3'), source: text (default 'human'), # 'human' | 'model' | 'hybrid' model_prediction: jsonb (nullable), created_at: timestamp (default now()) } ``` ### Training Runs Table (ML Service) ```typescript { id: serial (PK, auto-increment), run_id: text (unique, MLflow run ID), model_type: text (e.g., 'RandomForest', 'XGBoost'), experiment_name: text, pipeline_config_hash: text, dataset_version: text, metrics_summary: jsonb, status: text (e.g., 'running', 'completed', 'failed'), created_at: timestamp (default now()), completed_at: timestamp (nullable) } ``` ## API Endpoints ### POST /api/upload Upload CSV file and store candle data **Behavior**: Deletes all existing candles before inserting new data (replace mode) **Request**: multipart/form-data with `file` field **Response**: `{ success: true, count: number }` or `{ error: string }` ### GET /api/candles Retrieve all candle records **Response**: Array of candle objects ordered by time ### GET /api/annotations Retrieve all annotations **Response**: Array of annotation objects with parsed geometry ### POST /api/annotations Create a new annotation **Request**: `{ timestamp: number, label_type: string, geometry?: object }` **Response**: Created annotation object with ID ### DELETE /api/annotations/[id] Delete an annotation by ID **Response**: `{ success: true }` or `{ error: string }` ### GET /api/export Export annotations as downloadable CSV **Response**: CSV file download with Content-Disposition header ## Architecture ### Component Structure - **page.tsx**: Main page composition, manages active tool state - **Toolbox.tsx**: Sidebar with tool buttons and export functionality - **FileUpload.tsx**: CSV upload component with status messages - **CandleChart.tsx**: Core chart wrapper with lightweight-charts integration - Initializes chart with dark theme - Handles marker annotations (Break Up/Down) - Manages click events for annotation creation - Exposes `refreshData()` method for parent updates - **SvgOverlay.tsx**: Transparent SVG layer for line drawing - Coordinate transformation between data and pixels - Two-click line drawing with preview - Line hit detection for deletion ### Data Flow 1. User uploads CSV → POST /api/upload → SQLite storage 2. Chart mounts → GET /api/candles + GET /api/annotations → Render 3. User clicks with active tool → POST /api/annotations → Refresh chart 4. User deletes → DELETE /api/annotations/[id] → Refresh chart 5. User exports → GET /api/export → CSV download ## Development ### Project Structure ``` candle_annotator/ ├── src/ │ ├── app/ │ │ ├── api/ # API route handlers │ │ │ ├── upload/ │ │ │ ├── candles/ │ │ │ ├── annotations/ │ │ │ └── export/ │ │ ├── globals.css # Tailwind styles │ │ ├── layout.tsx # Root layout with dark theme │ │ └── page.tsx # Main page │ ├── components/ │ │ ├── ui/ # shadcn/ui components │ │ ├── CandleChart.tsx │ │ ├── SvgOverlay.tsx │ │ ├── Toolbox.tsx │ │ └── FileUpload.tsx │ └── lib/ │ ├── db/ │ │ ├── index.ts # Drizzle client │ │ ├── schema.ts # Table definitions │ │ └── migrate.ts # Migration runner │ └── utils.ts # Utility functions ├── data/ # SQLite database directory ├── drizzle/ # Migration files ├── DEPLOYMENT.md # Deployment instructions └── README.md # This file ``` ### Key Technical Decisions 1. **lightweight-charts v4**: Stable API with good candlestick and marker support 2. **PostgreSQL**: Shared database enables ML service to directly query candle/annotation data without CSV exports 3. **SVG Overlay for Lines**: Maintains separate rendering layer from chart, easier coordinate management 4. **Drizzle ORM**: Type-safe queries with minimal overhead, PostgreSQL dialect for production-grade features 5. **Next.js App Router**: Server-side API routes co-located with frontend code ### Known Limitations - **No Undo**: Can only delete annotations, not undo placement - **Memory**: Large CSV files (100k+ rows) may cause slow uploads - **Line Snapping**: Lines don't snap to candles, free-form placement only ## Troubleshooting See [DEPLOYMENT.md](./DEPLOYMENT.md) for detailed troubleshooting steps. Common issues: - **PostgreSQL connection errors**: Check `DATABASE_URL` environment variable and verify PostgreSQL is running - **Port 3000 in use**: Use `PORT=3001 npm run dev` - **Migration errors**: Ensure PostgreSQL is accessible before starting the application ## License ISC ## Contributing This is a focused tool for a specific use case. For questions or issues, please open a GitHub issue.