candle-annotator/CLAUDE_DESCRIPTION.md
Marko Djordjevic f5a67d9fa4 Update CLAUDE_DESCRIPTION.md with auth system, routing, and schema changes (v3.2.0)
- Document Auth.js v5 multi-user authentication with Credentials and Google OAuth
- Add new routing structure: (public) for /, /login, /register and /app/* protected routes
- Document middleware-based route protection in middleware.ts
- Add new API auth endpoints: /api/auth/register, /profile, /password, /account
- Document user_id foreign keys on all data tables with composite unique constraints
- Add settings page at /app/settings for user profile management
- Update API endpoints section to show auth endpoints and protected data endpoints
- Update file structure to reflect new auth files and route groups
- Update constraints to note authentication requirement and JWT session management
- Update version history to reflect v3.2.0 changes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 18:40:52 +01:00

459 lines
22 KiB
Markdown

# Candle Annotator - Project Description
## Project Overview
Candle Annotator is a complete machine learning platform for candlestick pattern recognition, combining manual annotation tools with an integrated Python ML pipeline. It's designed for traders and ML researchers to create labeled training data, train models, and deploy pattern recognition in an active learning loop.
**Current Version**: 3.1.0 (Database Consolidation - PostgreSQL for all application data)
## Recent Changes (v3.2.0)
### Multi-User Authentication System (Auth.js v5)
- **Authentication Framework**: Auth.js v5 with JWT session strategy (30-day max age)
- **Multiple Providers**:
- **Credentials**: Email/password with bcryptjs hashing
- **Google OAuth**: Social login with automatic user creation
- **User Management**: Users table with uuid primary key, email uniqueness, provider tracking
- **Middleware Protection**: Next.js middleware in proxy.ts protects /app/* routes, redirects unauthenticated users
- **Session Management**: JWT tokens include user ID for secure API access
- **User Isolation**: All data tables (charts, candles, annotations, etc.) linked to user_id foreign key
### New Routing Structure
- **Public Routes** (/ group):
- `/(public)/` - Landing page
- `/(public)/login` - Credentials/Google login
- `/(public)/register` - User registration with email/password
- **Protected Routes** (/app):
- `/app` - Main annotation interface (charts, candles, annotations)
- `/app/settings` - User settings and profile management
- **API Authentication Routes**:
- `POST /api/auth/register` - Register new user with email/password
- `POST /api/auth/login` - Credentials sign-in (handled by Auth.js)
- `POST /api/auth/logout` - Sign out (handled by Auth.js)
- `GET /api/auth/profile` - Get authenticated user profile
- `POST /api/auth/password` - Change user password
- `POST /api/auth/account` - Update account info (name, image)
### Database Schema Updates
- **Users Table**: Stores email, password_hash, name, image, provider (credentials/google), provider_account_id
- **User ID Foreign Keys**: All data tables (charts, candles, annotations, annotation_types, span_label_types, span_annotations) now reference users.id
- **Composite Unique Constraints**:
- charts: (user_id, name) composite unique
- candles: (chart_id, time) composite unique
- annotation_types: (user_id, name) composite unique
- span_label_types: (user_id, name) composite unique
- **Type Enforcement**: uuid, timestamp, doublePrecision for PostgreSQL compatibility
### Authentication & Middleware
- **Next.js Middleware** (src/middleware.ts):
- Protects /app/* routes (redirects to /login if unauthenticated)
- Protects /api/* routes except /api/auth/* and /api/health (returns 401 if not authenticated)
- Redirects authenticated users away from /login and /register to /app
- **Auth Configuration**: JWT-based sessions via Auth.js v5
## Previous Changes (v3.1.0)
### Database Consolidation to PostgreSQL
- **Unified Database**: Migrated from SQLite to PostgreSQL for all application data (frontend + ML service)
- **Shared Data Access**: ML service can now directly query candle and annotation tables from PostgreSQL
- **No More CSV Exports**: ML service reads training data directly from database, eliminating export/import steps
- **Schema Updates**: All tables converted to PostgreSQL types (serial, timestamp, double precision, jsonb)
- **Type Safety**: Fixed numpy type conversion issues in ML service data access layer
- **Migration Script**: One-time migration script (`scripts/migrate-sqlite-to-postgres.ts`) for upgrading from SQLite
- **Rollback Support**: Documented rollback procedure if migration fails
### Backend Infrastructure Updates
- **Drizzle ORM**: Updated to PostgreSQL dialect with pg driver
- **Database Connection**: Connection pooling (max: 10 connections) for Next.js API routes
- **ML Service ORM**: SQLAlchemy table reflections for read-only access to frontend tables
- **Environment Variables**: `DATABASE_URL` replaces `DATABASE_PATH`
- **Docker Compose**: Updated with shared PostgreSQL instance, removed `candle-data` volume
- **Health Checks**: Enhanced health endpoints verify PostgreSQL connectivity
## Previous Changes (v3.0.0)
### ML Pipeline Integration
- **Python ML Service**: FastAPI-based inference service for real-time pattern prediction
- **Feature Engineering**: TA-Lib integration for computing 150+ technical indicators (RSI, MACD, Bollinger Bands, etc.)
- **Model Training**: RandomForest and XGBoost models with MLflow experiment tracking
- **Prediction UI**: Chart overlay showing model predictions with confidence filtering
- **Disagreement Detection**: Automatic comparison of human annotations vs model predictions
- **Active Learning Loop**: Convert predictions to annotations for continuous model improvement
### Backend Infrastructure
- **PostgreSQL**: Dedicated database for ML service metadata and training runs
- **MLflow Server**: Experiment tracking, model registry, and artifact storage
- **DVC**: Data versioning for datasets
- **Docker Compose**: Multi-service orchestration (Next.js app, ML service, MLflow, PostgreSQL)
- **Health Checks**: Service health monitoring across all containers
### API Additions
- **POST /api/predict**: Inference proxy to ML service for candle predictions
- **POST /api/predict/batch**: Batch prediction for time ranges
- **GET /api/model/info**: Model metadata, metrics, and label configuration
- **GET /api/span-annotations/export**: Export annotations in ML pipeline format
- **Span Source Tracking**: `source` and `model_prediction` fields for feedback loop
## Previous Changes (v2.0.0)
### Label Management System
- **Label Sidebar**: Comprehensive collapsible sidebar showing all Break Up and Break Down annotations
- **Click Selection**: Click markers on chart or list items to select/highlight them
- **Keyboard Delete**: Press Delete or Backspace to remove selected label
- **Search & Filter**: Search by timestamp, filter by type, count display
- **Individual Delete**: Per-item trash button for quick removal
- **Visual Feedback**: Selected markers display with increased size and bright glow
### Hacker Theme
- **Color Scheme**: Matrix green (#00ff41) on dark background (#0a0e0a)
- **Typography**: Monospace font (JetBrains Mono with fallbacks)
- **Effects**: Glow effects on button hover, pulsing animations for active tools
- **Details**: Custom scrollbars, selection highlighting, terminal aesthetic
- **Accessibility**: High contrast meeting WCAG AA standards
### Docker Deployment
- **Multi-stage Build**: Optimized Dockerfile with ~100MB final image
- **Persistence**: Named volume for SQLite database across restarts
- **Health Check**: Built-in /api/health endpoint for container health
- **Security**: Non-root user (nextjs:1001) for safe execution
- **Convenience**: docker-compose.yml for one-command deployment
### API Enhancements
- **Bulk Delete**: DELETE /api/annotations?type=break_up,break_down for batch operations
- **Health Endpoint**: GET /api/health with optional database check (?check=db)
- **Existing DELETE /api/annotations/[id]**: Handles individual label deletion
## Technical Stack
### Frontend
- **Framework**: Next.js 16 (App Router)
- **UI Library**: React 19
- **Language**: TypeScript 5
- **Styling**: Tailwind CSS 3 + shadcn/ui components
- **Charting**: lightweight-charts v4 (TradingView fork)
- **Icons**: lucide-react
### Backend (Next.js)
- **Runtime**: Node.js 18.x
- **API**: Next.js API Routes
- **Database**: PostgreSQL 16 with pg driver
- **ORM**: Drizzle ORM (PostgreSQL dialect)
- **CSV**: papaparse
### ML Service (Python)
- **API Framework**: FastAPI with uvicorn
- **ML Libraries**: scikit-learn (RandomForest), XGBoost
- **Feature Engineering**: TA-Lib (Technical Analysis Library)
- **Data Processing**: pandas, numpy
- **Experiment Tracking**: MLflow (model registry, artifact storage)
- **Data Versioning**: DVC
- **Database**: PostgreSQL 16 (shared with frontend - reads candles/annotations, writes training runs)
- **ORM**: SQLAlchemy (for training runs) + table reflection (for frontend data)
- **Model Persistence**: joblib
- **Validation**: Pydantic
### DevOps
- **Containerization**: Docker with multi-stage builds
- **Orchestration**: docker-compose (4 services: app, ml-service, mlflow, postgres)
- **Build**: Next.js standalone output mode
- **Monitoring**: Health checks on all services
## Core Features
### Annotation Tools
1. **Break Up Labels**: Click marker on candle to add upward breakout label
2. **Break Down Labels**: Click marker on candle to add downward breakout label
3. **Trend Lines**: Two-click drawing with SVG overlay for custom trend lines
4. **Delete Tool**: Remove any annotation by clicking it
### Data Management
- **CSV Upload**: Import OHLC data (time, open, high, low, close)
- **Persistent Storage**: PostgreSQL database stores all annotations and candles
- **Shared Database**: ML service directly queries candle/annotation data
- **CSV Export**: Download labeled data (optional, no longer required for ML training)
### UI Components
- **Toolbox**: Left sidebar with tool buttons, color picker, label management
- **CandleChart**: Main chart area with interactive candlestick visualization
- **FileUpload**: Drag-and-drop CSV file upload
- **Label Sidebar**: Collapsible section with annotation list, search, filter
## File Structure & Key Files
```
candle_annotator/
├── src/ # Next.js Frontend & API
│ ├── app/
│ │ ├── (public)/ # Public route group (unauthenticated)
│ │ │ ├── login/page.tsx # Login page (Credentials + Google OAuth)
│ │ │ ├── register/page.tsx # Registration page
│ │ │ ├── layout.tsx # Public layout with navbar
│ │ │ ├── navbar.tsx # Public navbar component
│ │ │ └── session-provider.tsx # NextAuth session provider
│ │ ├── app/ # Protected route group (authenticated)
│ │ │ ├── page.tsx # Main annotation interface
│ │ │ ├── layout.tsx # Protected app layout
│ │ │ └── settings/
│ │ │ └── page.tsx # User settings & profile page
│ │ ├── api/
│ │ │ ├── auth/
│ │ │ │ ├── [...nextauth]/route.ts # Auth.js handlers
│ │ │ │ ├── register/route.ts # POST user registration
│ │ │ │ ├── profile/route.ts # GET user profile
│ │ │ │ ├── password/route.ts # POST change password
│ │ │ │ └── account/route.ts # POST update account info
│ │ │ ├── annotations/[id]/route.ts # GET label by ID, PATCH update, DELETE remove
│ │ │ ├── annotations/route.ts # GET all, POST create, DELETE bulk
│ │ │ ├── candles/route.ts # GET all candles for user
│ │ │ ├── charts/route.ts # GET/POST user charts
│ │ │ ├── export/route.ts # GET CSV export
│ │ │ ├── health/route.ts # GET health check (no auth required)
│ │ │ ├── upload/route.ts # POST CSV file upload
│ │ │ ├── predict/route.ts # POST prediction proxy
│ │ │ ├── predict/batch/route.ts # POST batch prediction proxy
│ │ │ ├── model/info/route.ts # GET model info proxy
│ │ │ └── span-annotations/
│ │ │ ├── route.ts # GET/POST span annotations
│ │ │ └── export/route.ts # GET export for ML pipeline
│ │ ├── globals.css # Hacker theme CSS variables
│ │ ├── layout.tsx # Root layout with font loading
│ │ └── page.tsx # Public landing page
│ ├── components/
│ │ ├── CandleChart.tsx # Chart core with prediction overlay
│ │ ├── PredictionPanel.tsx # Prediction controls & summary
│ │ ├── SvgOverlay.tsx # Line drawing layer
│ │ ├── Toolbox.tsx # Sidebar with tools & label list
│ │ ├── FileUpload.tsx # CSV upload
│ │ └── ui/ # shadcn/ui components
│ ├── types/
│ │ └── predictions.ts # Prediction types
│ ├── lib/
│ │ ├── db/
│ │ │ ├── index.ts # Drizzle client
│ │ │ ├── schema.ts # Table definitions (incl. users, with FK constraints)
│ │ │ ├── migrate.ts # Migration runner
│ │ │ └── seed-user-defaults.ts # Seed default annotation types for new users
│ │ └── utils.ts # Utility functions
│ ├── auth.ts # Auth.js v5 configuration (providers, callbacks)
│ └── middleware.ts # Next.js middleware for route protection
├── services/ml/ # Python ML Service
│ ├── app/
│ │ ├── main.py # FastAPI app & inference endpoints
│ │ ├── config.py # Pydantic config models
│ │ ├── db.py # PostgreSQL connection
│ │ └── annotation_ingestion.py # Annotation processing
│ ├── features/
│ │ ├── talib_features.py # TA-Lib indicator computation
│ │ ├── candle_features.py # Candle pattern features
│ │ └── custom_loader.py # Custom feature plugins
│ ├── training/
│ │ ├── train.py # Main training entry point
│ │ ├── evaluation.py # Metrics & visualization
│ │ └── models/
│ │ ├── random_forest.py # RandomForest wrapper
│ │ └── xgboost_model.py # XGBoost wrapper
│ ├── config/
│ │ └── pipeline.yaml # Pipeline configuration
│ ├── data/
│ │ ├── raw/ # Raw OHLCV CSV files
│ │ ├── enriched/ # Feature-engineered data
│ │ ├── labeled/ # Labeled training datasets
│ │ └── annotations/ # Exported annotations
│ ├── pipeline.py # Main pipeline orchestrator
│ ├── Dockerfile # ML service container
│ └── requirements.txt # Python dependencies
├── Docker files:
├── Dockerfile # Next.js multi-stage build
├── docker-compose.yml # Multi-service orchestration
├── .dockerignore # Exclude from image
└── .env.example # Environment template
```
## State Management
### Page Component (src/app/page.tsx)
- `activeTool`: Current selected tool (break_up, break_down, line, delete)
- `selectedColor`: Color for trend lines
- `selectedLabelId`: Currently selected label marker (null if none)
- `annotations`: Array of all annotations for sidebar
### CandleChart Component
- Fetches candles and annotations from API
- Renders markers with highlight for selectedLabelId
- Handles click events for annotation creation and selection
- Exposes refreshData() method for parent updates
### Toolbox Component
- `labelsExpanded`: Sidebar collapsed/expanded state
- `searchText`: Search query for timestamp filtering
- `filterType`: Label type filter (all, break_up, break_down)
- Renders label list with click selection and delete buttons
## Data Flow
1. **Initialization**:
- Page mounts → fetch /api/candles + /api/annotations
- CandleChart renders with data
2. **Add Label**:
- User clicks Break Up/Break Down tool
- Clicks on chart candle
- POST /api/annotations
- Chart refreshes automatically
3. **Select Label**:
- User clicks marker on chart OR list item in sidebar
- Updates selectedLabelId state
- CandleChart re-renders marker with highlight
- Sidebar highlights matching list item
4. **Delete Label**:
- Option A: Select label + press Delete key
- Option B: Click trash icon in sidebar list
- DELETE /api/annotations/[id]
- Annotations array updated
- Chart refreshes
5. **Export**:
- User clicks Export CSV button
- GET /api/export downloads CSV file
## API Endpoints
### Authentication
- `POST /api/auth/register` - Register new user with email and password
- `POST /api/auth/[...nextauth]` - Auth.js handlers (login, logout, callback)
- `GET /api/auth/profile` - Get authenticated user profile
- `POST /api/auth/password` - Change user password
- `POST /api/auth/account` - Update account info (name, image)
### Data Operations (Protected - requires authentication)
- `POST /api/upload` - Upload CSV with candle data
- `GET /api/candles` - Fetch all candles for authenticated user
- `GET /api/annotations` - Fetch all annotations for authenticated user
- `POST /api/annotations` - Create label
- `DELETE /api/annotations/[id]` - Delete label
- `DELETE /api/annotations?type=break_up,break_down` - Bulk delete by type
- `GET /api/export` - Download CSV export
- `GET /api/charts` - Fetch user's charts
- `POST /api/charts` - Create new chart
### Monitoring
- `GET /api/health` - Health check (no authentication required)
- `GET /api/health?check=db` - Health check with database verification (no authentication required)
## Development Workflow
### Adding Features
1. Update CandleChart/Toolbox components for UI
2. Add API route in src/app/api/ if needed
3. Update state management in page.tsx
4. Test in browser with `npm run dev`
5. Commit with clear message
### Fixing Bugs
1. Locate issue in component or API route
2. Add console.logs or debugger for investigation
3. Write minimal fix
4. Test across all annotation types
5. Commit with bug fix message
### Deployment
1. Test locally: `npm run dev`
2. Build production: `npm run build`
3. Commit changes
4. Docker deploy: `docker-compose up --build -d`
## Known Constraints
- **Authentication Required**: Multi-user system requires login; public landing page but /app/* routes protected
- **No Undo**: Annotations can only be deleted, not undone
- **PostgreSQL Required**: Application requires PostgreSQL server to be running
- **JWT Sessions**: 30-day session max age for security
- **Memory**: Large CSV files (100k+ rows) slow performance
- **Lines**: Free-form drawing, no snap-to-candle
- **OAuth Setup**: Google OAuth requires AUTH_GOOGLE_ID and AUTH_GOOGLE_SECRET environment variables
## Customization Points
### Theme Colors
Edit CSS variables in src/app/globals.css (matrix green to desired color)
### Chart Appearance
- Candlestick colors: CandleChart.tsx line 119-120
- Grid colors: line 96-98
- Font: globals.css body element
### API Routes
- Validation rules: src/app/api/*/route.ts
- Database operations: Use Drizzle ORM syntax
### UI Components
- Button styles: src/components/ui/button.tsx
- Sidebar layout: src/components/Toolbox.tsx
## Performance Notes
- **Chart rendering**: lightweight-charts optimized, 60fps target
- **SVG lines**: Only visible SVG redrawn, minimal overhead
- **Database**: SQLite async operations in server routes only
- **Frontend**: React memoization for chart component
## Security Considerations
- **Input validation**: File upload checks MIME type
- **SQL injection**: Drizzle ORM parameterized queries
- **CORS**: API routes served same-origin only
- **Docker**: Non-root user (nextjs:1001) for container execution
## Testing Checklist
Before marking features complete:
- [ ] All existing features still work (lines, colors, delete, export)
- [ ] New feature functions as designed
- [ ] Error states handled gracefully
- [ ] Browser console shows no errors
- [ ] Docker build and run successful
- [ ] Data persists across container restart
## Version History
### v3.2.0 (Current)
- Multi-user authentication system with Auth.js v5
- Credentials (email/password) and Google OAuth providers
- JWT session strategy with middleware route protection
- User isolation: all data tables linked to user_id
- New routing structure: (public) for /, /login, /register and /app/* for protected routes
- Settings page for user profile management
- New auth API endpoints: /api/auth/register, /api/auth/profile, /api/auth/password, /api/auth/account
- Composite unique constraints on charts, annotation_types, span_label_types
- User registration and account management
### v3.1.0
- Database consolidation to PostgreSQL
- Shared database between frontend and ML service
- Direct database access for ML training (no CSV exports)
- Migration script for SQLite to PostgreSQL
- Type safety improvements in ML service
### v3.0.0
- ML pipeline integration with FastAPI inference service
- Feature engineering with TA-Lib
- Model training with MLflow tracking
- Prediction UI with disagreement detection
- Active learning loop
### v2.0.0
- Label management sidebar with search/filter
- Hacker theme with matrix colors and monospace font
- Docker deployment with compose
- Keyboard delete for labels
- Label selection on chart
### v1.0.0
- Basic annotation tools (Break Up, Break Down, Line, Delete)
- Candlestick chart visualization
- CSV import/export
- SQLite persistence