Sort incoming candles by their time field in ascending chronological order before processing to ensure correct feature engineering regardless of input order. Also validate that no duplicate timestamps are present, returning HTTP 400 with details if duplicates are found. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .agents | ||
| .cursor/skills | ||
| .github/workflows | ||
| .qwen/skills | ||
| drizzle | ||
| lovable_design_html | ||
| models | ||
| openspec | ||
| public | ||
| RAZNO | ||
| scripts | ||
| services/ml | ||
| src | ||
| .dockerignore | ||
| .env.example | ||
| .eslintrc.json | ||
| .gitignore | ||
| candle_annotator.db | ||
| candle_annotator@1.0.0 | ||
| CLAUDE.md | ||
| CLAUDE_DESCRIPTION.md | ||
| CODE_REVIEW.md | ||
| components.json | ||
| DEPLOYMENT.md | ||
| docker-compose.yml | ||
| Dockerfile | ||
| drizzle.config.ts | ||
| EURUSD.csv | ||
| inference-ui-prompt.md | ||
| ml-pipeline-prompt.md | ||
| ML_QUICKSTART.md | ||
| next-env.d.ts | ||
| next.config.js | ||
| package-lock.json | ||
| package.json | ||
| postcss.config.js | ||
| README.md | ||
| tailwind.config.js | ||
| tailwind.config.ts | ||
| TALIB_WORKFLOW.md | ||
| tsconfig.json | ||
| tsconfig.tsbuildinfo | ||
| tsx | ||
Candle Annotator
A web-based tool for manually annotating candlestick charts with pattern labels and trend lines. Built for creating labeled training data for machine learning models in trading analysis.
Overview
Candle Annotator is a complete machine learning platform for candlestick pattern recognition, combining:
Annotation Tools - TradingView-like charting interface for creating labeled training data:
- Upload historical OHLC (Open, High, Low, Close) candle data from CSV files
- Visualize candlestick charts with interactive zoom and pan
- Annotate patterns with span labels (e.g., "Bullish Engulfing", "Doji", "Hammer")
- Mark breakout patterns (Break Up, Break Down) directly on candles
- Draw custom trend lines with two-click interaction
- Export annotations for ML training
ML Pipeline - Python-based training and inference system:
- Feature engineering with TA-Lib indicators (RSI, MACD, Bollinger Bands, etc.)
- Automated pattern detection using TA-Lib CDL* functions
- Train RandomForest and XGBoost models with MLflow experiment tracking
- FastAPI inference service for real-time predictions
- Integration with Next.js UI for prediction visualization
Active Learning Loop - Close the feedback cycle:
- Model predictions displayed as overlays on the chart
- Disagreement detection between human annotations and model predictions
- One-click feedback to confirm, correct, or dismiss predictions as new training data
- Continuous improvement through iterative annotation and retraining
Features
Data Management
- CSV Upload: Import OHLC data with support for both Unix timestamps and date strings
- Replace Mode: Uploading a new CSV deletes all old candles and replaces them with new data
- Initial Data: Docker containers automatically load EURUSD.csv on first startup if database is empty
- PostgreSQL Storage: All candle data and annotations stored in PostgreSQL database
- Shared Database: Frontend and ML service use the same database for seamless data access
- Data Persistence: Annotations and candles persist between sessions
Chart Visualization
- Interactive Candlestick Chart: Powered by lightweight-charts library
- Dark Theme: Eye-friendly slate color scheme
- Zoom & Pan: Mouse wheel zoom and drag-to-pan functionality
- Crosshair: Precise price and time tracking
Annotation Tools
- Break Up Markers: Green arrow markers below candles indicating upward breakouts
- Break Down Markers: Red arrow markers above candles indicating downward breakouts
- Trend Lines: Two-click line drawing with real-time preview
- Delete Tool: Remove any annotation (markers or lines) by clicking on them
- Tool Toggle: Click tool button again to deactivate
Label Management
- Label Sidebar: View all annotations in collapsible sidebar with:
- Click Selection: Click markers on chart or in sidebar to select/highlight
- Keyboard Delete: Press Delete or Backspace to remove selected label
- Individual Delete: Delete button on each list item
- Search: Search annotations by timestamp
- Filter: Filter by Break Up, Break Down, or All types
- Count Display: See how many Break Up vs Break Down markers exist
- Visual Highlight: Selected markers highlighted with glow effect
UI Theme
- Hacker Theme: Terminal-inspired dark aesthetic with:
- Matrix green (#00ff41) on dark background (#0a0e0a)
- Monospace font (JetBrains Mono) throughout
- Glow effects on button hover and active states
- Custom scrollbars styled to match theme
- High contrast for accessibility
Export & Deployment
- CSV Export: Download all annotations with timestamp, label type, and price data
- ML-Ready Format: Structured data suitable for training ML models
- Docker Deployment: One-command deployment with persistent data volume
- Health Check: Built-in /api/health endpoint for monitoring
ML Pipeline Features (Optional)
The integrated ML pipeline provides:
Feature Engineering
- TA-Lib Indicators: Automatic computation of 150+ technical indicators (RSI, MACD, Bollinger Bands, ATR, Stochastic, etc.)
- Candle Features: Body size, wick ratios, gap detection, price ranges
- Custom Features: Plugin system for domain-specific feature functions
- NaN Handling: Automatic warmup period detection and cleanup
Annotation Ingestion
- Windowed Classification: Extract fixed-size windows around each pattern for classification models
- BIO Sequence Labeling: Begin-Inside-Outside encoding for sequence models (future LSTM/GRU support)
- Programmatic Labels: TA-Lib CDL* pattern functions for auto-labeling (23+ candlestick patterns)
- Label Merging: Human-priority, programmatic-priority, or both strategies
- Dataset Statistics: Class distribution, label counts, human/programmatic agreement metrics
Model Training
- Model Types: RandomForest and XGBoost with class balancing
- Temporal Splitting: Train/val/test splits that respect time series order (no data leakage)
- MLflow Integration: Automatic experiment tracking, hyperparameter logging, artifact storage
- Model Registry: Versioned model storage with stage management (Production, Staging, Archived)
- Evaluation Metrics: Accuracy, F1 (macro/weighted), per-class precision/recall/F1
- Visualization: Confusion matrix, feature importance plots, classification reports
Inference Service
- FastAPI REST API: High-performance inference with automatic OpenAPI docs
- Preprocessing Parity: Loads pipeline config from MLflow to ensure training/inference consistency
- Batch Processing: Efficient prediction for large time ranges
- Span Grouping: Consecutive predictions merged into labeled spans with confidence scores
- Model Metadata: Endpoint to query model version, metrics, and label configuration
Prediction UI
- Chart Overlay: Predictions rendered as histogram series with label-specific colors
- Confidence Filtering: Slider to hide low-confidence predictions
- Label Filtering: Toggle visibility per pattern type with per-class F1 scores
- Disagreement Detection: Automatic comparison of human vs model predictions
- Prediction Summary: Counts for total predictions, agreements, disagreements
- Active Learning Feedback: Click predictions to convert them to annotations (future feature)
Architecture
System Components
┌─────────────────────────────────────────────────────────────────┐
│ Web Browser │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Next.js Frontend (React 19, Tailwind, lightweight-charts)│ │
│ │ - Annotation tools │ │
│ │ - Prediction visualization │ │
│ └──────────────────┬──────────────────────────────────────────┘ │
└─────────────────────┼──────────────────────────────────────────────┘
│ HTTP
▼
┌─────────────────────────────────────────────────────────────────┐
│ Next.js API Routes (TypeScript) │
│ - /api/candles, /api/annotations, /api/span-annotations │
│ - /api/predict (proxy) │
│ - /api/model/info (proxy) │
│ └───────────┬─────────────────────────────────────┬─────────── │
│ │ PostgreSQL │ HTTP │
│ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ PostgreSQL Database (Shared) │ │
│ │ - Frontend tables (candles, annotations, span_annotations) │
│ │ - ML tables (training_runs) │ │
│ │ - Accessed by: Next.js (Drizzle ORM) │ │
│ │ ML Service (SQLAlchemy) │ │
│ └──────────────────────┬──────────────────────────────────┘ │
│ │ │
│ ┌──────────┴──────────┐ │
│ │ │ │
│ ┌───────────▼─────────┐ ┌───────▼───────────────────┐ │
│ │ ML Inference API │ │ MLflow Server │ │
│ │ (FastAPI, Python) │ │ (Experiments, Registry) │ │
│ └─────────────────────┘ └───────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
ML Pipeline Workflow
1. Annotate Data (Web UI)
↓
2. Export Annotations (JSON)
↓
3. Feature Engineering (TA-Lib)
├─ Raw OHLCV → Enriched CSV (with indicators)
↓
4. Annotation Ingestion
├─ Annotations + Enriched CSV → Labeled Dataset
├─ Optional: TA-Lib CDL* auto-labeling
↓
5. Model Training
├─ Temporal train/val/test split
├─ RandomForest or XGBoost training
├─ MLflow experiment tracking
├─ Model registration
↓
6. Inference Service
├─ Load model from MLflow registry
├─ Serve predictions via FastAPI
↓
7. Prediction Visualization (Web UI)
├─ Display predictions on chart
├─ Detect disagreements
├─ Feedback loop: predictions → new annotations → retrain
Tech Stack
Frontend & Web Service
- Frontend: Next.js 16 (App Router), React 19, TypeScript
- Styling: Tailwind CSS 3, shadcn/ui components
- Charting: lightweight-charts 4.x (TradingView)
- Icons: lucide-react
- Backend: Next.js API Routes
- Database: PostgreSQL 16 with pg driver
- ORM: Drizzle ORM (PostgreSQL dialect)
- CSV Parsing: papaparse
ML Pipeline (Python)
- API Framework: FastAPI with uvicorn
- ML Libraries: scikit-learn (RandomForest), XGBoost
- Feature Engineering: TA-Lib (Technical Analysis Library)
- Data Processing: pandas, numpy
- Experiment Tracking: MLflow (model registry, artifact storage)
- Data Versioning: DVC (Data Version Control)
- Database: PostgreSQL 16 (shared with frontend - reads candles/annotations, writes training runs)
- ORM: SQLAlchemy (for training runs) + table reflection (for frontend data)
- Model Persistence: joblib
- Validation: Pydantic
Getting Started
Docker Quickstart (Recommended)
The fastest way to get running with Docker:
docker-compose up --build
Then open http://localhost:3000
See DEPLOYMENT.md for detailed Docker instructions.
Prerequisites
- Node.js 18.x or higher (for local development)
- npm 9.x or higher (for local development)
- PostgreSQL 16 or higher (for local development)
- Docker & docker-compose (for containerized deployment)
Local Development Installation
-
Clone the repository:
git clone <repository-url> cd candle_annotator -
Install dependencies:
npm install -
Setup PostgreSQL database:
createdb candle_annotator createuser -P ml_user # Enter password: ml_password psql -c "GRANT ALL PRIVILEGES ON DATABASE candle_annotator TO ml_user;" -
Create
.envfile:cp .env.example .env # Edit .env to set DATABASE_URL=postgresql://ml_user:ml_password@localhost:5432/candle_annotator -
Start the development server:
npm run dev -
Open http://localhost:3000 in your browser
Usage
- Upload Data: Click "Choose CSV File" and select a CSV with columns:
time,open,high,low,close - View Chart: The candlestick chart renders automatically after upload
- Add Annotations:
- Click "Label: Break Up" or "Label: Break Down" then click on a candle
- Click "Draw Line" then click two points to draw a trend line
- Press Escape to cancel line drawing
- Delete Annotations: Click "Delete" tool, then click on markers or lines to remove them
- Export: Click "Export CSV" to download all annotations
CSV File Format
Input Format
Your CSV file should have these columns:
time,open,high,low,close
1700000000,1.0500,1.0520,1.0490,1.0510
1700000060,1.0510,1.0530,1.0505,1.0525
Time column accepts:
- Unix timestamps (seconds):
1700000000 - Date strings:
2024-01-15,2024-01-15 10:30:00
Export Format
The exported CSV includes:
timestamp,label_type,price
1700000000,break_up,1.0510
1700000120,break_down,1.0505
1700000000,line,1.0500
- timestamp: Unix timestamp of the annotation
- label_type:
break_up,break_down, orline - price: Close price for markers, start price for lines
Database Schema
Candles Table (PostgreSQL)
{
id: serial (PK, auto-increment),
chart_id: integer (FK to charts.id),
time: timestamp (not null, indexed with chart_id),
open: double precision,
high: double precision,
low: double precision,
close: double precision
}
Annotations Table (Point Annotations)
{
id: serial (PK, auto-increment),
chart_id: integer (FK to charts.id),
timestamp: timestamp (not null),
label_type: text ('line' | 'rectangle'),
geometry: jsonb (for line/rectangle coordinates, nullable),
color: text (default '#3b82f6'),
created_at: timestamp (default now())
}
Span Annotations Table (Pattern Labels)
{
id: serial (PK, auto-increment),
chart_id: integer (FK to charts.id),
start_time: timestamp (not null),
end_time: timestamp (not null),
label: text (pattern name, e.g., 'Bullish Engulfing'),
confidence: integer (nullable),
outcome: text (nullable),
notes: text (nullable),
sub_spans: jsonb (nullable),
color: text (default '#2196F3'),
source: text (default 'human'), # 'human' | 'model' | 'hybrid'
model_prediction: jsonb (nullable),
created_at: timestamp (default now())
}
Training Runs Table (ML Service)
{
id: serial (PK, auto-increment),
run_id: text (unique, MLflow run ID),
model_type: text (e.g., 'RandomForest', 'XGBoost'),
experiment_name: text,
pipeline_config_hash: text,
dataset_version: text,
metrics_summary: jsonb,
status: text (e.g., 'running', 'completed', 'failed'),
created_at: timestamp (default now()),
completed_at: timestamp (nullable)
}
API Endpoints
POST /api/upload
Upload CSV file and store candle data
Behavior: Deletes all existing candles before inserting new data (replace mode)
Request: multipart/form-data with file field
Response: { success: true, count: number } or { error: string }
GET /api/candles
Retrieve all candle records
Response: Array of candle objects ordered by time
GET /api/annotations
Retrieve all annotations
Response: Array of annotation objects with parsed geometry
POST /api/annotations
Create a new annotation
Request: { timestamp: number, label_type: string, geometry?: object }
Response: Created annotation object with ID
DELETE /api/annotations/[id]
Delete an annotation by ID
Response: { success: true } or { error: string }
GET /api/export
Export annotations as downloadable CSV
Response: CSV file download with Content-Disposition header
Architecture
Component Structure
- page.tsx: Main page composition, manages active tool state
- Toolbox.tsx: Sidebar with tool buttons and export functionality
- FileUpload.tsx: CSV upload component with status messages
- CandleChart.tsx: Core chart wrapper with lightweight-charts integration
- Initializes chart with dark theme
- Handles marker annotations (Break Up/Down)
- Manages click events for annotation creation
- Exposes
refreshData()method for parent updates
- SvgOverlay.tsx: Transparent SVG layer for line drawing
- Coordinate transformation between data and pixels
- Two-click line drawing with preview
- Line hit detection for deletion
Data Flow
- User uploads CSV → POST /api/upload → SQLite storage
- Chart mounts → GET /api/candles + GET /api/annotations → Render
- User clicks with active tool → POST /api/annotations → Refresh chart
- User deletes → DELETE /api/annotations/[id] → Refresh chart
- User exports → GET /api/export → CSV download
Development
Project Structure
candle_annotator/
├── src/
│ ├── app/
│ │ ├── api/ # API route handlers
│ │ │ ├── upload/
│ │ │ ├── candles/
│ │ │ ├── annotations/
│ │ │ └── export/
│ │ ├── globals.css # Tailwind styles
│ │ ├── layout.tsx # Root layout with dark theme
│ │ └── page.tsx # Main page
│ ├── components/
│ │ ├── ui/ # shadcn/ui components
│ │ ├── CandleChart.tsx
│ │ ├── SvgOverlay.tsx
│ │ ├── Toolbox.tsx
│ │ └── FileUpload.tsx
│ └── lib/
│ ├── db/
│ │ ├── index.ts # Drizzle client
│ │ ├── schema.ts # Table definitions
│ │ └── migrate.ts # Migration runner
│ └── utils.ts # Utility functions
├── data/ # SQLite database directory
├── drizzle/ # Migration files
├── DEPLOYMENT.md # Deployment instructions
└── README.md # This file
Key Technical Decisions
- lightweight-charts v4: Stable API with good candlestick and marker support
- PostgreSQL: Shared database enables ML service to directly query candle/annotation data without CSV exports
- SVG Overlay for Lines: Maintains separate rendering layer from chart, easier coordinate management
- Drizzle ORM: Type-safe queries with minimal overhead, PostgreSQL dialect for production-grade features
- Next.js App Router: Server-side API routes co-located with frontend code
Known Limitations
- Single User: No authentication or concurrent access support
- No Undo: Can only delete annotations, not undo placement
- Memory: Large CSV files (100k+ rows) may cause slow uploads
- Line Snapping: Lines don't snap to candles, free-form placement only
Troubleshooting
See DEPLOYMENT.md for detailed troubleshooting steps.
Common issues:
- PostgreSQL connection errors: Check
DATABASE_URLenvironment variable and verify PostgreSQL is running - Port 3000 in use: Use
PORT=3001 npm run dev - Migration errors: Ensure PostgreSQL is accessible before starting the application
License
ISC
Contributing
This is a focused tool for a specific use case. For questions or issues, please open a GitHub issue.