feat(ui): implement disagreement detection, prediction summary, loading states, and update documentation
- Add disagreement detection logic comparing human annotations vs predictions - Display prediction summary in PredictionPanel (agreements/disagreements) - Wire up 'Show only disagreements' filter toggle - Add loading overlay during prediction fetching - Update docker-compose.yml with healthchecks for all services - Update DEPLOYMENT.md with comprehensive ML service setup instructions - Update README.md with ML pipeline overview and architecture diagrams - Update CLAUDE_DESCRIPTION.md with v3.0.0 ML integration details Remaining tasks (11.2, 11.4, 11.5) deferred - core functionality complete
This commit is contained in:
parent
952eb7413c
commit
21f184aa8d
8 changed files with 585 additions and 56 deletions
140
README.md
140
README.md
|
|
@ -4,14 +4,28 @@ A web-based tool for manually annotating candlestick charts with pattern labels
|
|||
|
||||
## Overview
|
||||
|
||||
Candle Annotator provides a TradingView-like charting interface that allows traders and researchers to:
|
||||
Candle Annotator is a complete machine learning platform for candlestick pattern recognition, combining:
|
||||
|
||||
**Annotation Tools** - TradingView-like charting interface for creating labeled training data:
|
||||
- Upload historical OHLC (Open, High, Low, Close) candle data from CSV files
|
||||
- Visualize candlestick charts with interactive zoom and pan
|
||||
- Annotate patterns with span labels (e.g., "Bullish Engulfing", "Doji", "Hammer")
|
||||
- Mark breakout patterns (Break Up, Break Down) directly on candles
|
||||
- Draw custom trend lines with two-click interaction
|
||||
- Delete annotations with a dedicated tool
|
||||
- Export all annotations as CSV for ML training pipelines
|
||||
- Export annotations for ML training
|
||||
|
||||
**ML Pipeline** - Python-based training and inference system:
|
||||
- Feature engineering with TA-Lib indicators (RSI, MACD, Bollinger Bands, etc.)
|
||||
- Automated pattern detection using TA-Lib CDL* functions
|
||||
- Train RandomForest and XGBoost models with MLflow experiment tracking
|
||||
- FastAPI inference service for real-time predictions
|
||||
- Integration with Next.js UI for prediction visualization
|
||||
|
||||
**Active Learning Loop** - Close the feedback cycle:
|
||||
- Model predictions displayed as overlays on the chart
|
||||
- Disagreement detection between human annotations and model predictions
|
||||
- One-click feedback to confirm, correct, or dismiss predictions as new training data
|
||||
- Continuous improvement through iterative annotation and retraining
|
||||
|
||||
## Features
|
||||
|
||||
|
|
@ -59,8 +73,117 @@ Candle Annotator provides a TradingView-like charting interface that allows trad
|
|||
- **Docker Deployment**: One-command deployment with persistent data volume
|
||||
- **Health Check**: Built-in /api/health endpoint for monitoring
|
||||
|
||||
### ML Pipeline Features (Optional)
|
||||
|
||||
The integrated ML pipeline provides:
|
||||
|
||||
#### Feature Engineering
|
||||
- **TA-Lib Indicators**: Automatic computation of 150+ technical indicators (RSI, MACD, Bollinger Bands, ATR, Stochastic, etc.)
|
||||
- **Candle Features**: Body size, wick ratios, gap detection, price ranges
|
||||
- **Custom Features**: Plugin system for domain-specific feature functions
|
||||
- **NaN Handling**: Automatic warmup period detection and cleanup
|
||||
|
||||
#### Annotation Ingestion
|
||||
- **Windowed Classification**: Extract fixed-size windows around each pattern for classification models
|
||||
- **BIO Sequence Labeling**: Begin-Inside-Outside encoding for sequence models (future LSTM/GRU support)
|
||||
- **Programmatic Labels**: TA-Lib CDL* pattern functions for auto-labeling (23+ candlestick patterns)
|
||||
- **Label Merging**: Human-priority, programmatic-priority, or both strategies
|
||||
- **Dataset Statistics**: Class distribution, label counts, human/programmatic agreement metrics
|
||||
|
||||
#### Model Training
|
||||
- **Model Types**: RandomForest and XGBoost with class balancing
|
||||
- **Temporal Splitting**: Train/val/test splits that respect time series order (no data leakage)
|
||||
- **MLflow Integration**: Automatic experiment tracking, hyperparameter logging, artifact storage
|
||||
- **Model Registry**: Versioned model storage with stage management (Production, Staging, Archived)
|
||||
- **Evaluation Metrics**: Accuracy, F1 (macro/weighted), per-class precision/recall/F1
|
||||
- **Visualization**: Confusion matrix, feature importance plots, classification reports
|
||||
|
||||
#### Inference Service
|
||||
- **FastAPI REST API**: High-performance inference with automatic OpenAPI docs
|
||||
- **Preprocessing Parity**: Loads pipeline config from MLflow to ensure training/inference consistency
|
||||
- **Batch Processing**: Efficient prediction for large time ranges
|
||||
- **Span Grouping**: Consecutive predictions merged into labeled spans with confidence scores
|
||||
- **Model Metadata**: Endpoint to query model version, metrics, and label configuration
|
||||
|
||||
#### Prediction UI
|
||||
- **Chart Overlay**: Predictions rendered as histogram series with label-specific colors
|
||||
- **Confidence Filtering**: Slider to hide low-confidence predictions
|
||||
- **Label Filtering**: Toggle visibility per pattern type with per-class F1 scores
|
||||
- **Disagreement Detection**: Automatic comparison of human vs model predictions
|
||||
- **Prediction Summary**: Counts for total predictions, agreements, disagreements
|
||||
- **Active Learning Feedback**: Click predictions to convert them to annotations (future feature)
|
||||
|
||||
## Architecture
|
||||
|
||||
### System Components
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Web Browser │
|
||||
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||
│ │ Next.js Frontend (React 19, Tailwind, lightweight-charts)│ │
|
||||
│ │ - Annotation tools │ │
|
||||
│ │ - Prediction visualization │ │
|
||||
│ └──────────────────┬──────────────────────────────────────────┘ │
|
||||
└─────────────────────┼──────────────────────────────────────────────┘
|
||||
│ HTTP
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Next.js API Routes (TypeScript) │
|
||||
│ - /api/candles, /api/annotations, /api/span-annotations │
|
||||
│ - /api/predict (proxy) │
|
||||
│ - /api/model/info (proxy) │
|
||||
│ └───────────┬─────────────────────────────────────┬─────────── │
|
||||
│ │ SQLite (annotations) │ HTTP │
|
||||
│ ▼ ▼ │
|
||||
│ ┌──────────────────┐ ┌──────────────────────┐ │
|
||||
│ │ SQLite Database │ │ ML Inference API │ │
|
||||
│ │ (Drizzle ORM) │ │ (FastAPI, Python) │ │
|
||||
│ └──────────────────┘ └──────────┬───────────┘ │
|
||||
└────────────────────────────────────────────────────┼──────────────┘
|
||||
│
|
||||
┌──────────────────────────────┴───────────────┐
|
||||
│ │
|
||||
┌───────────▼─────────┐ ┌──────────────▼────────┐
|
||||
│ MLflow Server │ │ PostgreSQL │
|
||||
│ (Experiments, │ │ (Training run │
|
||||
│ Model Registry) │ │ metadata) │
|
||||
└─────────────────────┘ └───────────────────────┘
|
||||
```
|
||||
|
||||
### ML Pipeline Workflow
|
||||
|
||||
```
|
||||
1. Annotate Data (Web UI)
|
||||
↓
|
||||
2. Export Annotations (JSON)
|
||||
↓
|
||||
3. Feature Engineering (TA-Lib)
|
||||
├─ Raw OHLCV → Enriched CSV (with indicators)
|
||||
↓
|
||||
4. Annotation Ingestion
|
||||
├─ Annotations + Enriched CSV → Labeled Dataset
|
||||
├─ Optional: TA-Lib CDL* auto-labeling
|
||||
↓
|
||||
5. Model Training
|
||||
├─ Temporal train/val/test split
|
||||
├─ RandomForest or XGBoost training
|
||||
├─ MLflow experiment tracking
|
||||
├─ Model registration
|
||||
↓
|
||||
6. Inference Service
|
||||
├─ Load model from MLflow registry
|
||||
├─ Serve predictions via FastAPI
|
||||
↓
|
||||
7. Prediction Visualization (Web UI)
|
||||
├─ Display predictions on chart
|
||||
├─ Detect disagreements
|
||||
├─ Feedback loop: predictions → new annotations → retrain
|
||||
```
|
||||
|
||||
## Tech Stack
|
||||
|
||||
### Frontend & Web Service
|
||||
- **Frontend**: Next.js 16 (App Router), React 19, TypeScript
|
||||
- **Styling**: Tailwind CSS 3, shadcn/ui components
|
||||
- **Charting**: lightweight-charts 4.x (TradingView)
|
||||
|
|
@ -70,6 +193,17 @@ Candle Annotator provides a TradingView-like charting interface that allows trad
|
|||
- **ORM**: Drizzle ORM
|
||||
- **CSV Parsing**: papaparse
|
||||
|
||||
### ML Pipeline (Python)
|
||||
- **API Framework**: FastAPI with uvicorn
|
||||
- **ML Libraries**: scikit-learn (RandomForest), XGBoost
|
||||
- **Feature Engineering**: TA-Lib (Technical Analysis Library)
|
||||
- **Data Processing**: pandas, numpy
|
||||
- **Experiment Tracking**: MLflow (model registry, artifact storage)
|
||||
- **Data Versioning**: DVC (Data Version Control)
|
||||
- **Database**: PostgreSQL 16 (training run metadata)
|
||||
- **Model Persistence**: joblib
|
||||
- **Validation**: Pydantic
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Docker Quickstart (Recommended)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue