candle-annotator

Author	SHA1	Message	Date
Marko Djordjevic	67dd7aa2f0	security: validate run_id format and add path containment check in ML service - Add `import re` to services/ml/app/main.py - In POST /model/load: validate run_id matches ^[a-zA-Z0-9_-]+$ before DB lookup; use Path.resolve() + directory containment check before loading model artifact - In DELETE /training/runs/{run_id}: validate run_id matches ^[a-zA-Z0-9_-]+$ before any processing; use Path.resolve() + directory containment check before deleting model artifact - Both endpoints return HTTP 400 with {"detail": "Invalid run_id format"} on invalid input - Mark task 2.2 as completed in openspec/changes/code-review-fix/tasks.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 11:00:19 +01:00
Marko Djordjevic	9bc82b822c	security: remove credential SQL comments and add DATABASE_URL fail-fast check - Remove hardcoded SQL comments containing 'ml_user' and 'ml_password' - Remove fallback default credentials in DATABASE_URL construction - Add fail-fast validation: raise RuntimeError if DATABASE_URL env var is missing or empty - Mark task 1.4 as complete in code-review-fix/tasks.md	2026-02-18 10:56:49 +01:00
Marko Djordjevic	d3dcfcea7d	feat: auto-build training dataset from DB annotations before training - Add build_dataset_from_db() that exports candles from DB, runs feature engineering, and ingests span annotations into labeled CSV - Call it automatically in _run_training_background before training starts - Add POST /training/build-dataset endpoint for standalone use - Add Next.js proxy route /api/training/build-dataset - Update TrainingPanel: remove dataset-missing block on Start Training, show informational message that dataset builds automatically	2026-02-18 00:24:39 +01:00
Marko Djordjevic	b4956f3fb9	fix: call init_db() on ML service startup to create training_runs table	2026-02-18 00:08:23 +01:00
Marko Djordjevic	6ef102cf21	fix: training panel stuck button, stale runs on startup, add delete model	2026-02-17 22:23:50 +01:00
Marko Djordjevic	2a02669222	feat: add FastAPI model/load endpoint and all Next.js proxy routes (tasks 2-4)	2026-02-17 18:47:04 +01:00
Marko Djordjevic	b8e649e333	feat: add FastAPI pattern detection endpoints (Section 1) - Extract CDL pattern detection logic into services/ml/app/patterns.py with TALIB_PATTERNS dict, get_available_patterns(), validate_pattern_names(), and detect_patterns(candles, pattern_names) functions - Add GET /patterns/available endpoint returning all 54 supported CDL pattern names with display names - Add POST /patterns/detect endpoint accepting {candles, patterns}, running selected CDL functions, returning span annotations with source "talib" - Add input validation: reject invalid pattern names with HTTP 400, treat empty patterns list as "run all" Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 18:34:14 +01:00
Marko Djordjevic	d1557a3846	fix: resolve numpy type conversion issues in ML service data access - Convert numpy.int64 to Python int before passing to SQLAlchemy queries - Prevents psycopg2.ProgrammingError: can't adapt type 'numpy.int64' - Applied to get_candles(), get_span_annotations(), and get_point_annotations() - All ML service database access tests now passing successfully	2026-02-17 14:10:21 +01:00
Marko Djordjevic	bfe437857b	feat: add Python migration script and successfully test SQLite to PostgreSQL data migration - Created scripts/migrate-sqlite-to-postgres.py as alternative to TypeScript version - Handles all type conversions: timestamps, booleans, and JSONB fields - Successfully migrated all 2,836 rows from SQLite to PostgreSQL - Verified data integrity: all 6 tables migrated correctly - Charts: 1, Candles: 2,592, Annotations: 4, Span annotations: 223	2026-02-17 14:01:21 +01:00
Marko Djordjevic	a18c6d110a	Remove __pycache__ from git tracking	2026-02-15 22:29:35 +01:00
Marko Djordjevic	f7b154f866	fix(ml): extract class labels from model when metadata is missing The model info returned empty labels array because the pkl file has no metadata dict. Now extracts labels from model.classes_ or model.model.classes_ as fallback.	2026-02-15 22:18:16 +01:00
Marko Djordjevic	be09cef098	fix(ml): add missing numpy import in main.py	2026-02-15 22:08:41 +01:00
Marko Djordjevic	40d6d1739e	fix(ml): add windowed feature flattening for inference parity The model was trained on 94-candle sliding windows flattened to 2820 features (94 candles x 30 features). Inference was sending raw per-candle features (27 columns). Changes: - Rewrite preprocessing to return (X, window_times) tuple - Add sliding window creation with correct feature ordering - Fill missing columns (average, barCount) with 0 for feature parity - Fill NaN from indicator warmup with 0 instead of dropping rows - Always compute all indicators (including MFI) for feature parity - Update predict and batch predict endpoints for new signature	2026-02-15 22:07:06 +01:00
Marko Djordjevic	4c7b3f2676	fix(ml): detect all-NaN volume column (not just missing column) The Pydantic model sets volume=None when absent, creating an all-NaN column rather than a missing column. Check isna().all() in addition to column existence.	2026-02-15 21:58:16 +01:00
Marko Djordjevic	317f925c43	fix(ml): handle missing volume data and skip volume-dependent indicators - Fill volume with 0 when column is absent from candle data - Skip MFI/OBV/AD/ADOSC indicators when no real volume data available - Fix pandas FutureWarning for inplace fillna in candle_features - Remove temporary debug NaN logging	2026-02-15 21:56:14 +01:00
Marko Djordjevic	f850728d44	fix(api): add GET /api/charts/[id] and fix batch prediction - Add GET handler to /api/charts/[id] route to fetch chart metadata - Fix batch prediction to use regular /predict endpoint with database candles - Remove /predict/batch usage (was designed for file-based predictions) - Make volume field optional in CandleData model (database candles don't have volume) - Convert timestamps to ISO dates for batch requests Known issue: TA-Lib indicators failing with 'input array type is not double' - May need to ensure candle data is float64/double type before processing	2026-02-15 21:49:22 +01:00
Marko Djordjevic	6d0d67e39b	fix(ml): make pair and timeframe optional in PredictRequest - Change pair and timeframe fields from required to optional - Frontend only sends candles array, not pair/timeframe metadata - These fields are only used for logging, not prediction logic - Update logging to handle None values with 'unknown' fallback - Fixes 422 validation error on /predict endpoint	2026-02-15 21:43:14 +01:00
Marko Djordjevic	aa81d4f3d0	fix(ml): complete ML pipeline fixes and setup - Fix CCI indicator to use HLC prices instead of close only - Parse datetime column when loading enriched CSV - Strip timezone from annotation timestamps - Fix TA-Lib pattern names (CDL3WHITESOLDIERS, CDL3BLACKCROWS) - Exclude programmatic label columns from training features - Fix classification report to handle missing classes - Update MLflow tracking to use localhost:5000 - Grant PostgreSQL permissions to ml_user Pipeline now runs successfully end-to-end: - Feature engineering: 2543 rows, 31 columns - Annotation ingestion: 286 samples - Training: 89.47% test accuracy with Random Forest	2026-02-15 21:29:54 +01:00
Marko Djordjevic	ceb4103ec4	fix(ml): parse datetime column and fix TA-Lib pattern names - Add parse_dates parameter when loading enriched CSV - Strip timezone from annotation timestamps to match data - Fix pattern names: CDLTHREEWHITESOLDIERS -> CDL3WHITESOLDIERS - Fix pattern names: CDLTHREEBLACKCROWS -> CDL3BLACKCROWS	2026-02-15 21:13:20 +01:00
Marko Djordjevic	2b86524436	fix(ml): correct CCI indicator signature to use HLC prices	2026-02-15 21:09:47 +01:00
Marko Djordjevic	847ff67986	feat(ml): add TA-Lib annotation generation and import workflow Add complete workflow for using TA-Lib to bootstrap training data: - generate_talib_annotations.py: Python script to run TA-Lib CDL* functions and output span annotations in UI-compatible format - import_talib_annotations.ts: TypeScript script to import generated annotations into the UI database with auto-label-type creation - npm script 'import-annotations' for easy execution - TALIB_WORKFLOW.md: Comprehensive guide covering the full cycle: * Generate patterns with TA-Lib * Import into UI * Review and edit in browser * Export and train model * Compare predictions with TA-Lib detections * Iterate for improvement This enables the intended workflow: use TA-Lib for initial annotations, manually refine them, then train a model that learns from corrections.	2026-02-15 19:18:28 +01:00
Marko Djordjevic	3a83fd38e9	feat(ml): implement FastAPI inference service with model loading, preprocessing, and prediction endpoints	2026-02-15 14:29:07 +01:00
Marko Djordjevic	16763b967e	feat(ml): implement annotation ingestion with windowed/BIO encoding and TA-Lib patterns	2026-02-15 12:28:58 +01:00
Marko Djordjevic	ea339a54a7	feat(ml): add database schema, config parser, and DVC setup - Initialize DVC with local storage backend (task 1.6) - Create PostgreSQL schema for training_runs table (task 1.7) - Add SQLAlchemy database connection setup (task 1.8) - Create Pydantic config models for pipeline.yaml (task 2.1) - Add migration runner for database setup - Fix pyproject.toml package discovery config	2026-02-15 12:08:53 +01:00

24 commits