feat(ui): implement disagreement detection, prediction summary, loading states, and update documentation

- Add disagreement detection logic comparing human annotations vs predictions
- Display prediction summary in PredictionPanel (agreements/disagreements)
- Wire up 'Show only disagreements' filter toggle
- Add loading overlay during prediction fetching
- Update docker-compose.yml with healthchecks for all services
- Update DEPLOYMENT.md with comprehensive ML service setup instructions
- Update README.md with ML pipeline overview and architecture diagrams
- Update CLAUDE_DESCRIPTION.md with v3.0.0 ML integration details

Remaining tasks (11.2, 11.4, 11.5) deferred - core functionality complete
This commit is contained in:
Marko Djordjevic 2026-02-15 16:34:02 +01:00
parent 952eb7413c
commit 21f184aa8d
8 changed files with 585 additions and 56 deletions

140
README.md
View file

@ -4,14 +4,28 @@ A web-based tool for manually annotating candlestick charts with pattern labels
## Overview
Candle Annotator provides a TradingView-like charting interface that allows traders and researchers to:
Candle Annotator is a complete machine learning platform for candlestick pattern recognition, combining:
**Annotation Tools** - TradingView-like charting interface for creating labeled training data:
- Upload historical OHLC (Open, High, Low, Close) candle data from CSV files
- Visualize candlestick charts with interactive zoom and pan
- Annotate patterns with span labels (e.g., "Bullish Engulfing", "Doji", "Hammer")
- Mark breakout patterns (Break Up, Break Down) directly on candles
- Draw custom trend lines with two-click interaction
- Delete annotations with a dedicated tool
- Export all annotations as CSV for ML training pipelines
- Export annotations for ML training
**ML Pipeline** - Python-based training and inference system:
- Feature engineering with TA-Lib indicators (RSI, MACD, Bollinger Bands, etc.)
- Automated pattern detection using TA-Lib CDL* functions
- Train RandomForest and XGBoost models with MLflow experiment tracking
- FastAPI inference service for real-time predictions
- Integration with Next.js UI for prediction visualization
**Active Learning Loop** - Close the feedback cycle:
- Model predictions displayed as overlays on the chart
- Disagreement detection between human annotations and model predictions
- One-click feedback to confirm, correct, or dismiss predictions as new training data
- Continuous improvement through iterative annotation and retraining
## Features
@ -59,8 +73,117 @@ Candle Annotator provides a TradingView-like charting interface that allows trad
- **Docker Deployment**: One-command deployment with persistent data volume
- **Health Check**: Built-in /api/health endpoint for monitoring
### ML Pipeline Features (Optional)
The integrated ML pipeline provides:
#### Feature Engineering
- **TA-Lib Indicators**: Automatic computation of 150+ technical indicators (RSI, MACD, Bollinger Bands, ATR, Stochastic, etc.)
- **Candle Features**: Body size, wick ratios, gap detection, price ranges
- **Custom Features**: Plugin system for domain-specific feature functions
- **NaN Handling**: Automatic warmup period detection and cleanup
#### Annotation Ingestion
- **Windowed Classification**: Extract fixed-size windows around each pattern for classification models
- **BIO Sequence Labeling**: Begin-Inside-Outside encoding for sequence models (future LSTM/GRU support)
- **Programmatic Labels**: TA-Lib CDL* pattern functions for auto-labeling (23+ candlestick patterns)
- **Label Merging**: Human-priority, programmatic-priority, or both strategies
- **Dataset Statistics**: Class distribution, label counts, human/programmatic agreement metrics
#### Model Training
- **Model Types**: RandomForest and XGBoost with class balancing
- **Temporal Splitting**: Train/val/test splits that respect time series order (no data leakage)
- **MLflow Integration**: Automatic experiment tracking, hyperparameter logging, artifact storage
- **Model Registry**: Versioned model storage with stage management (Production, Staging, Archived)
- **Evaluation Metrics**: Accuracy, F1 (macro/weighted), per-class precision/recall/F1
- **Visualization**: Confusion matrix, feature importance plots, classification reports
#### Inference Service
- **FastAPI REST API**: High-performance inference with automatic OpenAPI docs
- **Preprocessing Parity**: Loads pipeline config from MLflow to ensure training/inference consistency
- **Batch Processing**: Efficient prediction for large time ranges
- **Span Grouping**: Consecutive predictions merged into labeled spans with confidence scores
- **Model Metadata**: Endpoint to query model version, metrics, and label configuration
#### Prediction UI
- **Chart Overlay**: Predictions rendered as histogram series with label-specific colors
- **Confidence Filtering**: Slider to hide low-confidence predictions
- **Label Filtering**: Toggle visibility per pattern type with per-class F1 scores
- **Disagreement Detection**: Automatic comparison of human vs model predictions
- **Prediction Summary**: Counts for total predictions, agreements, disagreements
- **Active Learning Feedback**: Click predictions to convert them to annotations (future feature)
## Architecture
### System Components
```
┌─────────────────────────────────────────────────────────────────┐
│ Web Browser │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Next.js Frontend (React 19, Tailwind, lightweight-charts)│ │
│ │ - Annotation tools │ │
│ │ - Prediction visualization │ │
│ └──────────────────┬──────────────────────────────────────────┘ │
└─────────────────────┼──────────────────────────────────────────────┘
│ HTTP
┌─────────────────────────────────────────────────────────────────┐
│ Next.js API Routes (TypeScript) │
│ - /api/candles, /api/annotations, /api/span-annotations │
│ - /api/predict (proxy) │
│ - /api/model/info (proxy) │
│ └───────────┬─────────────────────────────────────┬─────────── │
│ │ SQLite (annotations) │ HTTP │
│ ▼ ▼ │
│ ┌──────────────────┐ ┌──────────────────────┐ │
│ │ SQLite Database │ │ ML Inference API │ │
│ │ (Drizzle ORM) │ │ (FastAPI, Python) │ │
│ └──────────────────┘ └──────────┬───────────┘ │
└────────────────────────────────────────────────────┼──────────────┘
┌──────────────────────────────┴───────────────┐
│ │
┌───────────▼─────────┐ ┌──────────────▼────────┐
│ MLflow Server │ │ PostgreSQL │
│ (Experiments, │ │ (Training run │
│ Model Registry) │ │ metadata) │
└─────────────────────┘ └───────────────────────┘
```
### ML Pipeline Workflow
```
1. Annotate Data (Web UI)
2. Export Annotations (JSON)
3. Feature Engineering (TA-Lib)
├─ Raw OHLCV → Enriched CSV (with indicators)
4. Annotation Ingestion
├─ Annotations + Enriched CSV → Labeled Dataset
├─ Optional: TA-Lib CDL* auto-labeling
5. Model Training
├─ Temporal train/val/test split
├─ RandomForest or XGBoost training
├─ MLflow experiment tracking
├─ Model registration
6. Inference Service
├─ Load model from MLflow registry
├─ Serve predictions via FastAPI
7. Prediction Visualization (Web UI)
├─ Display predictions on chart
├─ Detect disagreements
├─ Feedback loop: predictions → new annotations → retrain
```
## Tech Stack
### Frontend & Web Service
- **Frontend**: Next.js 16 (App Router), React 19, TypeScript
- **Styling**: Tailwind CSS 3, shadcn/ui components
- **Charting**: lightweight-charts 4.x (TradingView)
@ -70,6 +193,17 @@ Candle Annotator provides a TradingView-like charting interface that allows trad
- **ORM**: Drizzle ORM
- **CSV Parsing**: papaparse
### ML Pipeline (Python)
- **API Framework**: FastAPI with uvicorn
- **ML Libraries**: scikit-learn (RandomForest), XGBoost
- **Feature Engineering**: TA-Lib (Technical Analysis Library)
- **Data Processing**: pandas, numpy
- **Experiment Tracking**: MLflow (model registry, artifact storage)
- **Data Versioning**: DVC (Data Version Control)
- **Database**: PostgreSQL 16 (training run metadata)
- **Model Persistence**: joblib
- **Validation**: Pydantic
## Getting Started
### Docker Quickstart (Recommended)