candle-annotator

Author	SHA1	Message	Date
Marko Djordjevic	61f2471d48	Replace MLflow inline pip hack with proper Dockerfile Some checks failed Deploy to exe.dev / deploy (push) Has been cancelled Details Add services/mlflow/Dockerfile that extends the official image with psycopg2-binary, and point docker-compose to build from it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 09:57:24 +01:00
Marko Djordjevic	187243bfdb	Fix MLflow localhost references to use Docker service name Replace hardcoded localhost:5000 with mlflow:5000 in pipeline.yaml and main.py health check fallback so containers can reach MLflow over Docker's internal network. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 09:34:16 +01:00
Marko Djordjevic	60e3cb755d	remove __pycache__ from git	2026-02-21 09:11:47 +01:00
Marko Djordjevic	76cb49e908	Migrate MLflow backend to PostgreSQL and fix .env loading - Add python-dotenv loading in main.py so DATABASE_URL is read from .env before db.py module initializes - Add MLFLOW_TRACKING_URI to .env.example pointing to PostgreSQL - Add python-dotenv>=1.0.0 to pyproject.toml dependencies - Initialize MLflow schema in candle_annotator PostgreSQL database MLflow server now starts without filesystem deprecation warnings and with full job execution support. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-20 21:34:31 +01:00
Marko Djordjevic	688e75e6be	Scope training run queries in FastAPI to filter by user ID (Task 14.3) - Add user_id column to TrainingRun model in db.py - Store user_id on TrainingRun insert in /training/start - Filter GET /training/runs by user_id (returns empty list if no user context) - Enforce user ownership on GET /training/runs/{run_id} (404 on mismatch) - Enforce user ownership on DELETE /training/runs/{run_id} (404 on mismatch) - Add migration 002 to add user_id column and index to training_runs table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-20 18:38:18 +01:00
Marko Djordjevic	cbb921b4a7	Scope MLflow experiment names to include user ID (Task 14.2) - Updated FastAPI /training/start endpoint to extract X-User-ID header via get_user_id() dependency - Modified _run_training_background to accept and use user_id parameter - Added MLflow experiment setup with user scoping: experiments are named user_{user_id}_training when user_id is provided, falling back to default experiment name otherwise - Updated database record insertion to store scoped experiment name - Updated training/train.py train() function to accept user_id parameter and use it for experiment naming - Mark task 14.2 as complete in tasks.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-20 13:46:00 +01:00
Marko Djordjevic	399e760fa5	Add X-User-ID header extraction to FastAPI service. Create a get_user_id() dependency that extracts the X-User-ID header from incoming requests, making it available to route handlers. The dependency is optional (not enforced) — callers decide whether to use it or require it on specific routes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-20 13:44:02 +01:00
Marko Djordjevic	26f2c44509	Fix XGBoost label encoding and single-class guard	2026-02-18 23:58:24 +01:00
Marko Djordjevic	73c10a4156	Fix inference feature mismatch with training metadata	2026-02-18 23:53:38 +01:00
Marko Djordjevic	07064fbf40	fix(training): use selected chart and include TA-Lib span sources	2026-02-18 23:21:23 +01:00
Marko Djordjevic	e446e3e563	Fix ml-service volume permissions	2026-02-18 22:50:36 +01:00
Marko Djordjevic	86a8cc66b6	fix deployment folder permisions	2026-02-18 22:34:23 +01:00
Marko Djordjevic	1b5f278685	fix: replace TA-Lib source build with prebuilt .deb v0.6.4 in ML Dockerfile	2026-02-18 21:30:23 +01:00
Marko Djordjevic	49daaae36a	code-review-fix task 15.4: add missing fallback return for volume indicators in talib_features.py	2026-02-18 20:58:47 +01:00
Marko Djordjevic	059c436717	code-review-fix task 15.3: replace datetime.utcnow() with datetime.now(timezone.utc) in main.py	2026-02-18 20:58:11 +01:00
Marko Djordjevic	a03c9bc17b	code-review-fix task 15.2: replace declarative_base() with class Base(DeclarativeBase) in db.py	2026-02-18 20:57:37 +01:00
Marko Djordjevic	41287b20c6	code-review-fix task 15.1: replace @app.on_event startup with FastAPI lifespan pattern in main.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 20:57:06 +01:00
Marko Djordjevic	f529ada877	code-review-fix task 14.5: remove duplicate TALIB_PATTERNS dict and import from single source	2026-02-18 20:55:14 +01:00
Marko Djordjevic	346954bcee	code-review-fix task 14.4: fix inconsistent uuid_lib import to standard import uuid in main.py	2026-02-18 20:48:38 +01:00
Marko Djordjevic	d42b4827ca	code-review-fix task 14.3: remove dead inference package reference from pyproject.toml	2026-02-18 20:48:10 +01:00
Marko Djordjevic	8d9a86efb4	code-review-fix task 14.2: remove dead get_db_session function from db.py	2026-02-18 20:47:44 +01:00
Marko Djordjevic	5896e56faa	feat: add sha256 pinning TODO comments to both Dockerfiles Add TODO comments above each FROM instruction in Dockerfile and services/ml/Dockerfile instructing how to pin base images to sha256 digests for reproducible builds. Marks task 6.7 as complete. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 11:37:17 +01:00
Marko Djordjevic	e146de2e05	security: add SHA256 checksum verification for TA-Lib download in ML Dockerfile Splits the monolithic TA-Lib build RUN command to insert an ARG for the expected SHA256 hash and a sha256sum -c verification step immediately after the wget download, before extraction and build. Marks task 6.5 complete. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 11:36:06 +01:00
Marko Djordjevic	a6e0697ab5	fix: Change TA-Lib download URL to HTTPS in Dockerfile Updated the wget command in services/ml/Dockerfile line 10 to use HTTPS instead of HTTP for downloading TA-Lib source. This improves security by ensuring encrypted transport. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 11:35:15 +01:00
Marko Djordjevic	1438e474e8	security: add non-root appuser to services/ml/Dockerfile Create system user appuser with useradd, set ownership of /app, and switch to non-root user before CMD to reduce container attack surface. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 11:34:26 +01:00
Marko Djordjevic	3d8672121e	feat: add run_id format validation to GET /training/runs/{run_id} endpoint - Add new GET endpoint for retrieving a specific training run by run_id - Validate run_id format with regex pattern ^[a-zA-Z0-9_-]+$ before DB access - Return HTTP 400 for invalid run_id format, HTTP 404 for non-existent runs - Ensure DELETE endpoint validation is correctly placed before any DB access - Both endpoints now provide consistent security validation - Mark task 5.8 as completed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 11:33:12 +01:00
Marko Djordjevic	3dc0014328	feat: add training resource limits (500MB size check + 30-min timeout) - Import concurrent.futures for timeout support - In _run_training_background: check df.memory_usage(deep=True).sum() after loading the labeled dataset; raise ValueError if > 500MB - Wrap model.fit() in a ThreadPoolExecutor with a 1800s timeout; on TimeoutError update DB status to "failed" with message "Training timed out after 30 minutes" and return early - Mark task 5.7 as done in openspec/changes/code-review-fix/tasks.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 11:31:33 +01:00
Marko Djordjevic	f94d16c6ab	fix: implement real health checks in ML service /health endpoint - Execute SELECT 1 against PostgreSQL via SQLAlchemy session to verify DB connectivity - HTTP GET to ${MLFLOW_TRACKING_URI}/health (default: http://localhost:5000/health) to verify MLflow - Return HTTP 503 if any component is unhealthy, HTTP 200 only when both are healthy - Values: "healthy" / "unhealthy" for database and mlflow fields - Add `import requests as http_requests` and `sa_text` imports Closes task 5.6 in openspec/changes/code-review-fix/tasks.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 11:30:12 +01:00
Marko Djordjevic	d75b05b585	feat: add candle time-sort and duplicate timestamp validation to POST /predict Sort incoming candles by their time field in ascending chronological order before processing to ensure correct feature engineering regardless of input order. Also validate that no duplicate timestamps are present, returning HTTP 400 with details if duplicates are found. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 11:28:02 +01:00
Marko Djordjevic	ff952aa083	feat(ml): add date range validation to POST /predict/batch - Parse start_date and end_date as datetime objects - Return HTTP 400 if end_date is before start_date - Return HTTP 400 if date range exceeds 365 days (1 year) Closes task 5.4 in code-review-fix tasks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 11:27:12 +01:00
Marko Djordjevic	b9beea1574	fix(ml): add _model_swap_lock to prediction reads for thread-safe model access In /predict and /predict/batch endpoints, grab the model reference under _model_swap_lock before running inference. Inference itself runs outside the lock (using a local variable) to avoid blocking model swaps during potentially slow computation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 11:26:33 +01:00
Marko Djordjevic	ff15adc847	feat: add SHA256 model integrity check before joblib.load() Add verify_model_checksum() that validates model files against a models/checksums.sha256 manifest before loading. Fails open when manifest is missing or file not listed (backward compat), raises HTTP 500 on hash mismatch. Created empty manifest placeholder. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 11:25:14 +01:00
Marko Djordjevic	b7f9b2e04d	security: replace exception details with generic error in ML service HTTP responses Replace all instances of `detail=str(e)`, `detail=f"...{exc}"`, and similar patterns that exposed internal exception messages to HTTP clients in services/ml/app/main.py. All exception details are now logged server-side only via logger.error(), while clients receive a generic "Internal server error" message. Fixes 9 handlers across predict, batch predict, pattern detection, training start, training runs fetch, training run delete, dataset info, build dataset, and model load endpoints. Mark task 5.1 as completed in tasks.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 11:23:41 +01:00
Marko Djordjevic	5f569d9134	feat(ml): add API key authentication via FastAPI Depends() on all endpoints except /health - Import Header, Depends, Security from fastapi - Add verify_api_key dependency: reads API_KEY env var, checks X-API-Key header, raises HTTP 401 if key mismatch; fail-open if env var not set - Apply Depends(verify_api_key) to all 14 non-health endpoints - /health endpoint remains unauthenticated for liveness probes - Mark task 3.2 as complete in tasks.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 11:04:41 +01:00
Marko Djordjevic	26ff80a682	fix: implement CORS origins configuration via environment variable - Replace hardcoded allow_origins=['*'] with dynamic configuration - Read CORS_ORIGINS environment variable (comma-separated list) - Default to 'http://localhost:3000' if CORS_ORIGINS is not set - Support multiple origins by splitting and stripping whitespace from env var	2026-02-18 11:02:00 +01:00
Marko Djordjevic	67dd7aa2f0	security: validate run_id format and add path containment check in ML service - Add `import re` to services/ml/app/main.py - In POST /model/load: validate run_id matches ^[a-zA-Z0-9_-]+$ before DB lookup; use Path.resolve() + directory containment check before loading model artifact - In DELETE /training/runs/{run_id}: validate run_id matches ^[a-zA-Z0-9_-]+$ before any processing; use Path.resolve() + directory containment check before deleting model artifact - Both endpoints return HTTP 400 with {"detail": "Invalid run_id format"} on invalid input - Mark task 2.2 as completed in openspec/changes/code-review-fix/tasks.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 11:00:19 +01:00
Marko Djordjevic	9bc82b822c	security: remove credential SQL comments and add DATABASE_URL fail-fast check - Remove hardcoded SQL comments containing 'ml_user' and 'ml_password' - Remove fallback default credentials in DATABASE_URL construction - Add fail-fast validation: raise RuntimeError if DATABASE_URL env var is missing or empty - Mark task 1.4 as complete in code-review-fix/tasks.md	2026-02-18 10:56:49 +01:00
Marko Djordjevic	ea89543c0f	feat: add models/ and *.pkl to .gitignore, remove tracked model files from git history	2026-02-18 10:55:48 +01:00
Marko Djordjevic	9a549b8c7a	fix: make volume column optional in feature engineering, skip MFI when absent	2026-02-18 01:01:32 +01:00
Marko Djordjevic	d3dcfcea7d	feat: auto-build training dataset from DB annotations before training - Add build_dataset_from_db() that exports candles from DB, runs feature engineering, and ingests span annotations into labeled CSV - Call it automatically in _run_training_background before training starts - Add POST /training/build-dataset endpoint for standalone use - Add Next.js proxy route /api/training/build-dataset - Update TrainingPanel: remove dataset-missing block on Start Training, show informational message that dataset builds automatically	2026-02-18 00:24:39 +01:00
Marko Djordjevic	b4956f3fb9	fix: call init_db() on ML service startup to create training_runs table	2026-02-18 00:08:23 +01:00
Marko Djordjevic	69634909d1	fix: correct timestamp/boolean types for PostgreSQL schema (Date not int, bool not 0/1)	2026-02-17 22:50:31 +01:00
Marko Djordjevic	6ef102cf21	fix: training panel stuck button, stale runs on startup, add delete model	2026-02-17 22:23:50 +01:00
Marko Djordjevic	2a02669222	feat: add FastAPI model/load endpoint and all Next.js proxy routes (tasks 2-4)	2026-02-17 18:47:04 +01:00
Marko Djordjevic	b8e649e333	feat: add FastAPI pattern detection endpoints (Section 1) - Extract CDL pattern detection logic into services/ml/app/patterns.py with TALIB_PATTERNS dict, get_available_patterns(), validate_pattern_names(), and detect_patterns(candles, pattern_names) functions - Add GET /patterns/available endpoint returning all 54 supported CDL pattern names with display names - Add POST /patterns/detect endpoint accepting {candles, patterns}, running selected CDL functions, returning span annotations with source "talib" - Add input validation: reject invalid pattern names with HTTP 400, treat empty patterns list as "run all" Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 18:34:14 +01:00
Marko Djordjevic	d1557a3846	fix: resolve numpy type conversion issues in ML service data access - Convert numpy.int64 to Python int before passing to SQLAlchemy queries - Prevents psycopg2.ProgrammingError: can't adapt type 'numpy.int64' - Applied to get_candles(), get_span_annotations(), and get_point_annotations() - All ML service database access tests now passing successfully	2026-02-17 14:10:21 +01:00
Marko Djordjevic	bfe437857b	feat: add Python migration script and successfully test SQLite to PostgreSQL data migration - Created scripts/migrate-sqlite-to-postgres.py as alternative to TypeScript version - Handles all type conversions: timestamps, booleans, and JSONB fields - Successfully migrated all 2,836 rows from SQLite to PostgreSQL - Verified data integrity: all 6 tables migrated correctly - Charts: 1, Candles: 2,592, Annotations: 4, Span annotations: 223	2026-02-17 14:01:21 +01:00
Marko Djordjevic	ecb23855b5	fix: add curl to ml-service for healthcheck	2026-02-16 17:31:08 +01:00
Marko Djordjevic	08bd9625ae	fix: build TA-Lib from source in ML Dockerfile	2026-02-16 14:58:07 +01:00
Marko Djordjevic	a18c6d110a	Remove __pycache__ from git tracking	2026-02-15 22:29:35 +01:00

1 2

73 commits