Add new 'User Accounts & Authentication' section describing: - Multi-user support with per-user data isolation - Auth.js v5 with credentials and Google OAuth sign-in - Registration with email validation and password requirements - Settings page for profile management and account deletion - Default admin account seeding Remove outdated single-user limitation from Known Limitations section. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
509 lines
21 KiB
Markdown
509 lines
21 KiB
Markdown
# Candle Annotator
|
|
|
|
A web-based tool for manually annotating candlestick charts with pattern labels and trend lines. Built for creating labeled training data for machine learning models in trading analysis.
|
|
|
|
## Overview
|
|
|
|
Candle Annotator is a complete machine learning platform for candlestick pattern recognition, combining:
|
|
|
|
**Annotation Tools** - TradingView-like charting interface for creating labeled training data:
|
|
- Upload historical OHLC (Open, High, Low, Close) candle data from CSV files
|
|
- Visualize candlestick charts with interactive zoom and pan
|
|
- Annotate patterns with span labels (e.g., "Bullish Engulfing", "Doji", "Hammer")
|
|
- Mark breakout patterns (Break Up, Break Down) directly on candles
|
|
- Draw custom trend lines with two-click interaction
|
|
- Export annotations for ML training
|
|
|
|
**ML Pipeline** - Python-based training and inference system:
|
|
- Feature engineering with TA-Lib indicators (RSI, MACD, Bollinger Bands, etc.)
|
|
- Automated pattern detection using TA-Lib CDL* functions
|
|
- Train RandomForest and XGBoost models with MLflow experiment tracking
|
|
- FastAPI inference service for real-time predictions
|
|
- Integration with Next.js UI for prediction visualization
|
|
|
|
**Active Learning Loop** - Close the feedback cycle:
|
|
- Model predictions displayed as overlays on the chart
|
|
- Disagreement detection between human annotations and model predictions
|
|
- One-click feedback to confirm, correct, or dismiss predictions as new training data
|
|
- Continuous improvement through iterative annotation and retraining
|
|
|
|
## Features
|
|
|
|
### User Accounts & Authentication
|
|
- **Multi-User Support**: Each user has isolated data with per-user workspace
|
|
- **Auth.js v5 Integration**: Flexible authentication system supporting multiple sign-in methods
|
|
- **Sign-In Methods**:
|
|
- **Credentials (Email/Password)**: Traditional email and password authentication with bcryptjs hashing
|
|
- **Google OAuth**: One-click sign-in with Google accounts
|
|
- **Registration**: Self-service account creation with email validation and password requirements (minimum 8 characters)
|
|
- **Settings Page**: Update display name, change password for credential users, or delete account with confirmation
|
|
- **Default Admin Account**: Database seeding with default admin credentials for initial setup
|
|
- **Per-User Data Isolation**: All charts, annotations, and ML models are scoped to individual users
|
|
|
|
### Data Management
|
|
- **CSV Upload**: Import OHLC data with support for both Unix timestamps and date strings
|
|
- **Replace Mode**: Uploading a new CSV deletes all old candles and replaces them with new data
|
|
- **Initial Data**: Docker containers automatically load EURUSD.csv on first startup if database is empty
|
|
- **PostgreSQL Storage**: All candle data and annotations stored in PostgreSQL database
|
|
- **Shared Database**: Frontend and ML service use the same database for seamless data access
|
|
- **Data Persistence**: Annotations and candles persist between sessions
|
|
|
|
### Chart Visualization
|
|
- **Interactive Candlestick Chart**: Powered by lightweight-charts library
|
|
- **Dark Theme**: Eye-friendly slate color scheme
|
|
- **Zoom & Pan**: Mouse wheel zoom and drag-to-pan functionality
|
|
- **Crosshair**: Precise price and time tracking
|
|
|
|
### Annotation Tools
|
|
- **Break Up Markers**: Green arrow markers below candles indicating upward breakouts
|
|
- **Break Down Markers**: Red arrow markers above candles indicating downward breakouts
|
|
- **Trend Lines**: Two-click line drawing with real-time preview
|
|
- **Delete Tool**: Remove any annotation (markers or lines) by clicking on them
|
|
- **Tool Toggle**: Click tool button again to deactivate
|
|
|
|
### Label Management
|
|
- **Label Sidebar**: View all annotations in collapsible sidebar with:
|
|
- **Click Selection**: Click markers on chart or in sidebar to select/highlight
|
|
- **Keyboard Delete**: Press Delete or Backspace to remove selected label
|
|
- **Individual Delete**: Delete button on each list item
|
|
- **Search**: Search annotations by timestamp
|
|
- **Filter**: Filter by Break Up, Break Down, or All types
|
|
- **Count Display**: See how many Break Up vs Break Down markers exist
|
|
- **Visual Highlight**: Selected markers highlighted with glow effect
|
|
|
|
### UI Theme
|
|
- **Hacker Theme**: Terminal-inspired dark aesthetic with:
|
|
- Matrix green (#00ff41) on dark background (#0a0e0a)
|
|
- Monospace font (JetBrains Mono) throughout
|
|
- Glow effects on button hover and active states
|
|
- Custom scrollbars styled to match theme
|
|
- High contrast for accessibility
|
|
|
|
### Export & Deployment
|
|
- **CSV Export**: Download all annotations with timestamp, label type, and price data
|
|
- **ML-Ready Format**: Structured data suitable for training ML models
|
|
- **Docker Deployment**: One-command deployment with persistent data volume
|
|
- **Health Check**: Built-in /api/health endpoint for monitoring
|
|
|
|
### ML Pipeline Features (Optional)
|
|
|
|
The integrated ML pipeline provides:
|
|
|
|
#### Feature Engineering
|
|
- **TA-Lib Indicators**: Automatic computation of 150+ technical indicators (RSI, MACD, Bollinger Bands, ATR, Stochastic, etc.)
|
|
- **Candle Features**: Body size, wick ratios, gap detection, price ranges
|
|
- **Custom Features**: Plugin system for domain-specific feature functions
|
|
- **NaN Handling**: Automatic warmup period detection and cleanup
|
|
|
|
#### Annotation Ingestion
|
|
- **Windowed Classification**: Extract fixed-size windows around each pattern for classification models
|
|
- **BIO Sequence Labeling**: Begin-Inside-Outside encoding for sequence models (future LSTM/GRU support)
|
|
- **Programmatic Labels**: TA-Lib CDL* pattern functions for auto-labeling (23+ candlestick patterns)
|
|
- **Label Merging**: Human-priority, programmatic-priority, or both strategies
|
|
- **Dataset Statistics**: Class distribution, label counts, human/programmatic agreement metrics
|
|
|
|
#### Model Training
|
|
- **Model Types**: RandomForest and XGBoost with class balancing
|
|
- **Temporal Splitting**: Train/val/test splits that respect time series order (no data leakage)
|
|
- **MLflow Integration**: Automatic experiment tracking, hyperparameter logging, artifact storage
|
|
- **Model Registry**: Versioned model storage with stage management (Production, Staging, Archived)
|
|
- **Evaluation Metrics**: Accuracy, F1 (macro/weighted), per-class precision/recall/F1
|
|
- **Visualization**: Confusion matrix, feature importance plots, classification reports
|
|
|
|
#### Inference Service
|
|
- **FastAPI REST API**: High-performance inference with automatic OpenAPI docs
|
|
- **Preprocessing Parity**: Loads pipeline config from MLflow to ensure training/inference consistency
|
|
- **Batch Processing**: Efficient prediction for large time ranges
|
|
- **Span Grouping**: Consecutive predictions merged into labeled spans with confidence scores
|
|
- **Model Metadata**: Endpoint to query model version, metrics, and label configuration
|
|
|
|
#### Prediction UI
|
|
- **Chart Overlay**: Predictions rendered as histogram series with label-specific colors
|
|
- **Confidence Filtering**: Slider to hide low-confidence predictions
|
|
- **Label Filtering**: Toggle visibility per pattern type with per-class F1 scores
|
|
- **Disagreement Detection**: Automatic comparison of human vs model predictions
|
|
- **Prediction Summary**: Counts for total predictions, agreements, disagreements
|
|
- **Active Learning Feedback**: Click predictions to convert them to annotations (future feature)
|
|
|
|
## Architecture
|
|
|
|
### System Components
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Web Browser │
|
|
│ ┌───────────────────────────────────────────────────────────┐ │
|
|
│ │ Next.js Frontend (React 19, Tailwind, lightweight-charts)│ │
|
|
│ │ - Annotation tools │ │
|
|
│ │ - Prediction visualization │ │
|
|
│ └──────────────────┬──────────────────────────────────────────┘ │
|
|
└─────────────────────┼──────────────────────────────────────────────┘
|
|
│ HTTP
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Next.js API Routes (TypeScript) │
|
|
│ - /api/candles, /api/annotations, /api/span-annotations │
|
|
│ - /api/predict (proxy) │
|
|
│ - /api/model/info (proxy) │
|
|
│ └───────────┬─────────────────────────────────────┬─────────── │
|
|
│ │ PostgreSQL │ HTTP │
|
|
│ ▼ ▼ │
|
|
│ ┌──────────────────────────────────────────────────────────┐ │
|
|
│ │ PostgreSQL Database (Shared) │ │
|
|
│ │ - Frontend tables (candles, annotations, span_annotations) │
|
|
│ │ - ML tables (training_runs) │ │
|
|
│ │ - Accessed by: Next.js (Drizzle ORM) │ │
|
|
│ │ ML Service (SQLAlchemy) │ │
|
|
│ └──────────────────────┬──────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ┌──────────┴──────────┐ │
|
|
│ │ │ │
|
|
│ ┌───────────▼─────────┐ ┌───────▼───────────────────┐ │
|
|
│ │ ML Inference API │ │ MLflow Server │ │
|
|
│ │ (FastAPI, Python) │ │ (Experiments, Registry) │ │
|
|
│ └─────────────────────┘ └───────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### ML Pipeline Workflow
|
|
|
|
```
|
|
1. Annotate Data (Web UI)
|
|
↓
|
|
2. Export Annotations (JSON)
|
|
↓
|
|
3. Feature Engineering (TA-Lib)
|
|
├─ Raw OHLCV → Enriched CSV (with indicators)
|
|
↓
|
|
4. Annotation Ingestion
|
|
├─ Annotations + Enriched CSV → Labeled Dataset
|
|
├─ Optional: TA-Lib CDL* auto-labeling
|
|
↓
|
|
5. Model Training
|
|
├─ Temporal train/val/test split
|
|
├─ RandomForest or XGBoost training
|
|
├─ MLflow experiment tracking
|
|
├─ Model registration
|
|
↓
|
|
6. Inference Service
|
|
├─ Load model from MLflow registry
|
|
├─ Serve predictions via FastAPI
|
|
↓
|
|
7. Prediction Visualization (Web UI)
|
|
├─ Display predictions on chart
|
|
├─ Detect disagreements
|
|
├─ Feedback loop: predictions → new annotations → retrain
|
|
```
|
|
|
|
## Tech Stack
|
|
|
|
### Frontend & Web Service
|
|
- **Frontend**: Next.js 16 (App Router), React 19, TypeScript
|
|
- **Styling**: Tailwind CSS 3, shadcn/ui components
|
|
- **Charting**: lightweight-charts 4.x (TradingView)
|
|
- **Icons**: lucide-react
|
|
- **Backend**: Next.js API Routes
|
|
- **Database**: PostgreSQL 16 with pg driver
|
|
- **ORM**: Drizzle ORM (PostgreSQL dialect)
|
|
- **CSV Parsing**: papaparse
|
|
|
|
### ML Pipeline (Python)
|
|
- **API Framework**: FastAPI with uvicorn
|
|
- **ML Libraries**: scikit-learn (RandomForest), XGBoost
|
|
- **Feature Engineering**: TA-Lib (Technical Analysis Library)
|
|
- **Data Processing**: pandas, numpy
|
|
- **Experiment Tracking**: MLflow (model registry, artifact storage)
|
|
- **Data Versioning**: DVC (Data Version Control)
|
|
- **Database**: PostgreSQL 16 (shared with frontend - reads candles/annotations, writes training runs)
|
|
- **ORM**: SQLAlchemy (for training runs) + table reflection (for frontend data)
|
|
- **Model Persistence**: joblib
|
|
- **Validation**: Pydantic
|
|
|
|
## Getting Started
|
|
|
|
### Docker Quickstart (Recommended)
|
|
|
|
The fastest way to get running with Docker:
|
|
|
|
```bash
|
|
docker-compose up --build
|
|
```
|
|
|
|
Then open http://localhost:3000
|
|
|
|
See [DEPLOYMENT.md](./DEPLOYMENT.md#docker-deployment) for detailed Docker instructions.
|
|
|
|
### Prerequisites
|
|
|
|
- Node.js 18.x or higher (for local development)
|
|
- npm 9.x or higher (for local development)
|
|
- PostgreSQL 16 or higher (for local development)
|
|
- Docker & docker-compose (for containerized deployment)
|
|
|
|
### Local Development Installation
|
|
|
|
1. Clone the repository:
|
|
```bash
|
|
git clone <repository-url>
|
|
cd candle_annotator
|
|
```
|
|
|
|
2. Install dependencies:
|
|
```bash
|
|
npm install
|
|
```
|
|
|
|
3. Setup PostgreSQL database:
|
|
```bash
|
|
createdb candle_annotator
|
|
createuser -P ml_user
|
|
# Enter password: ml_password
|
|
psql -c "GRANT ALL PRIVILEGES ON DATABASE candle_annotator TO ml_user;"
|
|
```
|
|
|
|
4. Create `.env` file:
|
|
```bash
|
|
cp .env.example .env
|
|
# Edit .env to set DATABASE_URL=postgresql://ml_user:ml_password@localhost:5432/candle_annotator
|
|
```
|
|
|
|
5. Start the development server:
|
|
```bash
|
|
npm run dev
|
|
```
|
|
|
|
6. Open http://localhost:3000 in your browser
|
|
|
|
### Usage
|
|
|
|
1. **Upload Data**: Click "Choose CSV File" and select a CSV with columns: `time,open,high,low,close`
|
|
2. **View Chart**: The candlestick chart renders automatically after upload
|
|
3. **Add Annotations**:
|
|
- Click "Label: Break Up" or "Label: Break Down" then click on a candle
|
|
- Click "Draw Line" then click two points to draw a trend line
|
|
- Press Escape to cancel line drawing
|
|
4. **Delete Annotations**: Click "Delete" tool, then click on markers or lines to remove them
|
|
5. **Export**: Click "Export CSV" to download all annotations
|
|
|
|
## CSV File Format
|
|
|
|
### Input Format
|
|
|
|
Your CSV file should have these columns:
|
|
|
|
```csv
|
|
time,open,high,low,close
|
|
1700000000,1.0500,1.0520,1.0490,1.0510
|
|
1700000060,1.0510,1.0530,1.0505,1.0525
|
|
```
|
|
|
|
**Time column** accepts:
|
|
- Unix timestamps (seconds): `1700000000`
|
|
- Date strings: `2024-01-15`, `2024-01-15 10:30:00`
|
|
|
|
### Export Format
|
|
|
|
The exported CSV includes:
|
|
|
|
```csv
|
|
timestamp,label_type,price
|
|
1700000000,break_up,1.0510
|
|
1700000120,break_down,1.0505
|
|
1700000000,line,1.0500
|
|
```
|
|
|
|
- **timestamp**: Unix timestamp of the annotation
|
|
- **label_type**: `break_up`, `break_down`, or `line`
|
|
- **price**: Close price for markers, start price for lines
|
|
|
|
## Database Schema
|
|
|
|
### Candles Table (PostgreSQL)
|
|
|
|
```typescript
|
|
{
|
|
id: serial (PK, auto-increment),
|
|
chart_id: integer (FK to charts.id),
|
|
time: timestamp (not null, indexed with chart_id),
|
|
open: double precision,
|
|
high: double precision,
|
|
low: double precision,
|
|
close: double precision
|
|
}
|
|
```
|
|
|
|
### Annotations Table (Point Annotations)
|
|
|
|
```typescript
|
|
{
|
|
id: serial (PK, auto-increment),
|
|
chart_id: integer (FK to charts.id),
|
|
timestamp: timestamp (not null),
|
|
label_type: text ('line' | 'rectangle'),
|
|
geometry: jsonb (for line/rectangle coordinates, nullable),
|
|
color: text (default '#3b82f6'),
|
|
created_at: timestamp (default now())
|
|
}
|
|
```
|
|
|
|
### Span Annotations Table (Pattern Labels)
|
|
|
|
```typescript
|
|
{
|
|
id: serial (PK, auto-increment),
|
|
chart_id: integer (FK to charts.id),
|
|
start_time: timestamp (not null),
|
|
end_time: timestamp (not null),
|
|
label: text (pattern name, e.g., 'Bullish Engulfing'),
|
|
confidence: integer (nullable),
|
|
outcome: text (nullable),
|
|
notes: text (nullable),
|
|
sub_spans: jsonb (nullable),
|
|
color: text (default '#2196F3'),
|
|
source: text (default 'human'), # 'human' | 'model' | 'hybrid'
|
|
model_prediction: jsonb (nullable),
|
|
created_at: timestamp (default now())
|
|
}
|
|
```
|
|
|
|
### Training Runs Table (ML Service)
|
|
|
|
```typescript
|
|
{
|
|
id: serial (PK, auto-increment),
|
|
run_id: text (unique, MLflow run ID),
|
|
model_type: text (e.g., 'RandomForest', 'XGBoost'),
|
|
experiment_name: text,
|
|
pipeline_config_hash: text,
|
|
dataset_version: text,
|
|
metrics_summary: jsonb,
|
|
status: text (e.g., 'running', 'completed', 'failed'),
|
|
created_at: timestamp (default now()),
|
|
completed_at: timestamp (nullable)
|
|
}
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
### POST /api/upload
|
|
Upload CSV file and store candle data
|
|
|
|
**Behavior**: Deletes all existing candles before inserting new data (replace mode)
|
|
**Request**: multipart/form-data with `file` field
|
|
**Response**: `{ success: true, count: number }` or `{ error: string }`
|
|
|
|
### GET /api/candles
|
|
Retrieve all candle records
|
|
|
|
**Response**: Array of candle objects ordered by time
|
|
|
|
### GET /api/annotations
|
|
Retrieve all annotations
|
|
|
|
**Response**: Array of annotation objects with parsed geometry
|
|
|
|
### POST /api/annotations
|
|
Create a new annotation
|
|
|
|
**Request**: `{ timestamp: number, label_type: string, geometry?: object }`
|
|
**Response**: Created annotation object with ID
|
|
|
|
### DELETE /api/annotations/[id]
|
|
Delete an annotation by ID
|
|
|
|
**Response**: `{ success: true }` or `{ error: string }`
|
|
|
|
### GET /api/export
|
|
Export annotations as downloadable CSV
|
|
|
|
**Response**: CSV file download with Content-Disposition header
|
|
|
|
## Architecture
|
|
|
|
### Component Structure
|
|
|
|
- **page.tsx**: Main page composition, manages active tool state
|
|
- **Toolbox.tsx**: Sidebar with tool buttons and export functionality
|
|
- **FileUpload.tsx**: CSV upload component with status messages
|
|
- **CandleChart.tsx**: Core chart wrapper with lightweight-charts integration
|
|
- Initializes chart with dark theme
|
|
- Handles marker annotations (Break Up/Down)
|
|
- Manages click events for annotation creation
|
|
- Exposes `refreshData()` method for parent updates
|
|
- **SvgOverlay.tsx**: Transparent SVG layer for line drawing
|
|
- Coordinate transformation between data and pixels
|
|
- Two-click line drawing with preview
|
|
- Line hit detection for deletion
|
|
|
|
### Data Flow
|
|
|
|
1. User uploads CSV → POST /api/upload → SQLite storage
|
|
2. Chart mounts → GET /api/candles + GET /api/annotations → Render
|
|
3. User clicks with active tool → POST /api/annotations → Refresh chart
|
|
4. User deletes → DELETE /api/annotations/[id] → Refresh chart
|
|
5. User exports → GET /api/export → CSV download
|
|
|
|
## Development
|
|
|
|
### Project Structure
|
|
|
|
```
|
|
candle_annotator/
|
|
├── src/
|
|
│ ├── app/
|
|
│ │ ├── api/ # API route handlers
|
|
│ │ │ ├── upload/
|
|
│ │ │ ├── candles/
|
|
│ │ │ ├── annotations/
|
|
│ │ │ └── export/
|
|
│ │ ├── globals.css # Tailwind styles
|
|
│ │ ├── layout.tsx # Root layout with dark theme
|
|
│ │ └── page.tsx # Main page
|
|
│ ├── components/
|
|
│ │ ├── ui/ # shadcn/ui components
|
|
│ │ ├── CandleChart.tsx
|
|
│ │ ├── SvgOverlay.tsx
|
|
│ │ ├── Toolbox.tsx
|
|
│ │ └── FileUpload.tsx
|
|
│ └── lib/
|
|
│ ├── db/
|
|
│ │ ├── index.ts # Drizzle client
|
|
│ │ ├── schema.ts # Table definitions
|
|
│ │ └── migrate.ts # Migration runner
|
|
│ └── utils.ts # Utility functions
|
|
├── data/ # SQLite database directory
|
|
├── drizzle/ # Migration files
|
|
├── DEPLOYMENT.md # Deployment instructions
|
|
└── README.md # This file
|
|
```
|
|
|
|
### Key Technical Decisions
|
|
|
|
1. **lightweight-charts v4**: Stable API with good candlestick and marker support
|
|
2. **PostgreSQL**: Shared database enables ML service to directly query candle/annotation data without CSV exports
|
|
3. **SVG Overlay for Lines**: Maintains separate rendering layer from chart, easier coordinate management
|
|
4. **Drizzle ORM**: Type-safe queries with minimal overhead, PostgreSQL dialect for production-grade features
|
|
5. **Next.js App Router**: Server-side API routes co-located with frontend code
|
|
|
|
### Known Limitations
|
|
|
|
- **No Undo**: Can only delete annotations, not undo placement
|
|
- **Memory**: Large CSV files (100k+ rows) may cause slow uploads
|
|
- **Line Snapping**: Lines don't snap to candles, free-form placement only
|
|
|
|
## Troubleshooting
|
|
|
|
See [DEPLOYMENT.md](./DEPLOYMENT.md) for detailed troubleshooting steps.
|
|
|
|
Common issues:
|
|
- **PostgreSQL connection errors**: Check `DATABASE_URL` environment variable and verify PostgreSQL is running
|
|
- **Port 3000 in use**: Use `PORT=3001 npm run dev`
|
|
- **Migration errors**: Ensure PostgreSQL is accessible before starting the application
|
|
|
|
## License
|
|
|
|
ISC
|
|
|
|
## Contributing
|
|
|
|
This is a focused tool for a specific use case. For questions or issues, please open a GitHub issue.
|