Commit graph

31 commits

Author SHA1 Message Date
Marko Djordjevic
76cb49e908 Migrate MLflow backend to PostgreSQL and fix .env loading
- Add python-dotenv loading in main.py so DATABASE_URL is read from .env
  before db.py module initializes
- Add MLFLOW_TRACKING_URI to .env.example pointing to PostgreSQL
- Add python-dotenv>=1.0.0 to pyproject.toml dependencies
- Initialize MLflow schema in candle_annotator PostgreSQL database

MLflow server now starts without filesystem deprecation warnings and
with full job execution support.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 21:34:31 +01:00
Marko Djordjevic
688e75e6be Scope training run queries in FastAPI to filter by user ID (Task 14.3)
- Add user_id column to TrainingRun model in db.py
- Store user_id on TrainingRun insert in /training/start
- Filter GET /training/runs by user_id (returns empty list if no user context)
- Enforce user ownership on GET /training/runs/{run_id} (404 on mismatch)
- Enforce user ownership on DELETE /training/runs/{run_id} (404 on mismatch)
- Add migration 002 to add user_id column and index to training_runs table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 18:38:18 +01:00
Marko Djordjevic
cbb921b4a7 Scope MLflow experiment names to include user ID (Task 14.2)
- Updated FastAPI /training/start endpoint to extract X-User-ID header via get_user_id() dependency
- Modified _run_training_background to accept and use user_id parameter
- Added MLflow experiment setup with user scoping: experiments are named user_{user_id}_training when user_id is provided, falling back to default experiment name otherwise
- Updated database record insertion to store scoped experiment name
- Updated training/train.py train() function to accept user_id parameter and use it for experiment naming
- Mark task 14.2 as complete in tasks.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 13:46:00 +01:00
Marko Djordjevic
399e760fa5 Add X-User-ID header extraction to FastAPI service.
Create a get_user_id() dependency that extracts the X-User-ID header
from incoming requests, making it available to route handlers.
The dependency is optional (not enforced) — callers decide whether
to use it or require it on specific routes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 13:44:02 +01:00
Marko Djordjevic
73c10a4156 Fix inference feature mismatch with training metadata 2026-02-18 23:53:38 +01:00
Marko Djordjevic
07064fbf40 fix(training): use selected chart and include TA-Lib span sources 2026-02-18 23:21:23 +01:00
Marko Djordjevic
059c436717 code-review-fix task 15.3: replace datetime.utcnow() with datetime.now(timezone.utc) in main.py 2026-02-18 20:58:11 +01:00
Marko Djordjevic
41287b20c6 code-review-fix task 15.1: replace @app.on_event startup with FastAPI lifespan pattern in main.py
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-18 20:57:06 +01:00
Marko Djordjevic
346954bcee code-review-fix task 14.4: fix inconsistent uuid_lib import to standard import uuid in main.py 2026-02-18 20:48:38 +01:00
Marko Djordjevic
3d8672121e feat: add run_id format validation to GET /training/runs/{run_id} endpoint
- Add new GET endpoint for retrieving a specific training run by run_id
- Validate run_id format with regex pattern ^[a-zA-Z0-9_-]+$ before DB access
- Return HTTP 400 for invalid run_id format, HTTP 404 for non-existent runs
- Ensure DELETE endpoint validation is correctly placed before any DB access
- Both endpoints now provide consistent security validation
- Mark task 5.8 as completed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-18 11:33:12 +01:00
Marko Djordjevic
3dc0014328 feat: add training resource limits (500MB size check + 30-min timeout)
- Import concurrent.futures for timeout support
- In _run_training_background: check df.memory_usage(deep=True).sum()
  after loading the labeled dataset; raise ValueError if > 500MB
- Wrap model.fit() in a ThreadPoolExecutor with a 1800s timeout;
  on TimeoutError update DB status to "failed" with message
  "Training timed out after 30 minutes" and return early
- Mark task 5.7 as done in openspec/changes/code-review-fix/tasks.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-18 11:31:33 +01:00
Marko Djordjevic
f94d16c6ab fix: implement real health checks in ML service /health endpoint
- Execute SELECT 1 against PostgreSQL via SQLAlchemy session to verify DB connectivity
- HTTP GET to ${MLFLOW_TRACKING_URI}/health (default: http://localhost:5000/health) to verify MLflow
- Return HTTP 503 if any component is unhealthy, HTTP 200 only when both are healthy
- Values: "healthy" / "unhealthy" for database and mlflow fields
- Add `import requests as http_requests` and `sa_text` imports

Closes task 5.6 in openspec/changes/code-review-fix/tasks.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-18 11:30:12 +01:00
Marko Djordjevic
d75b05b585 feat: add candle time-sort and duplicate timestamp validation to POST /predict
Sort incoming candles by their time field in ascending chronological order
before processing to ensure correct feature engineering regardless of input
order. Also validate that no duplicate timestamps are present, returning
HTTP 400 with details if duplicates are found.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-18 11:28:02 +01:00
Marko Djordjevic
ff952aa083 feat(ml): add date range validation to POST /predict/batch
- Parse start_date and end_date as datetime objects
- Return HTTP 400 if end_date is before start_date
- Return HTTP 400 if date range exceeds 365 days (1 year)

Closes task 5.4 in code-review-fix tasks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-18 11:27:12 +01:00
Marko Djordjevic
b9beea1574 fix(ml): add _model_swap_lock to prediction reads for thread-safe model access
In /predict and /predict/batch endpoints, grab the model reference under
_model_swap_lock before running inference. Inference itself runs outside
the lock (using a local variable) to avoid blocking model swaps during
potentially slow computation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-18 11:26:33 +01:00
Marko Djordjevic
ff15adc847 feat: add SHA256 model integrity check before joblib.load()
Add verify_model_checksum() that validates model files against a
models/checksums.sha256 manifest before loading. Fails open when
manifest is missing or file not listed (backward compat), raises
HTTP 500 on hash mismatch. Created empty manifest placeholder.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-18 11:25:14 +01:00
Marko Djordjevic
b7f9b2e04d security: replace exception details with generic error in ML service HTTP responses
Replace all instances of `detail=str(e)`, `detail=f"...{exc}"`, and similar
patterns that exposed internal exception messages to HTTP clients in
services/ml/app/main.py. All exception details are now logged server-side
only via logger.error(), while clients receive a generic "Internal server error"
message. Fixes 9 handlers across predict, batch predict, pattern detection,
training start, training runs fetch, training run delete, dataset info,
build dataset, and model load endpoints.

Mark task 5.1 as completed in tasks.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-18 11:23:41 +01:00
Marko Djordjevic
5f569d9134 feat(ml): add API key authentication via FastAPI Depends() on all endpoints except /health
- Import Header, Depends, Security from fastapi
- Add verify_api_key dependency: reads API_KEY env var, checks X-API-Key
  header, raises HTTP 401 if key mismatch; fail-open if env var not set
- Apply Depends(verify_api_key) to all 14 non-health endpoints
- /health endpoint remains unauthenticated for liveness probes
- Mark task 3.2 as complete in tasks.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-18 11:04:41 +01:00
Marko Djordjevic
26ff80a682 fix: implement CORS origins configuration via environment variable
- Replace hardcoded allow_origins=['*'] with dynamic configuration
- Read CORS_ORIGINS environment variable (comma-separated list)
- Default to 'http://localhost:3000' if CORS_ORIGINS is not set
- Support multiple origins by splitting and stripping whitespace from env var
2026-02-18 11:02:00 +01:00
Marko Djordjevic
67dd7aa2f0 security: validate run_id format and add path containment check in ML service
- Add `import re` to services/ml/app/main.py
- In POST /model/load: validate run_id matches ^[a-zA-Z0-9_-]+$ before DB lookup; use Path.resolve() + directory containment check before loading model artifact
- In DELETE /training/runs/{run_id}: validate run_id matches ^[a-zA-Z0-9_-]+$ before any processing; use Path.resolve() + directory containment check before deleting model artifact
- Both endpoints return HTTP 400 with {"detail": "Invalid run_id format"} on invalid input
- Mark task 2.2 as completed in openspec/changes/code-review-fix/tasks.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-18 11:00:19 +01:00
Marko Djordjevic
d3dcfcea7d feat: auto-build training dataset from DB annotations before training
- Add build_dataset_from_db() that exports candles from DB, runs feature
  engineering, and ingests span annotations into labeled CSV
- Call it automatically in _run_training_background before training starts
- Add POST /training/build-dataset endpoint for standalone use
- Add Next.js proxy route /api/training/build-dataset
- Update TrainingPanel: remove dataset-missing block on Start Training,
  show informational message that dataset builds automatically
2026-02-18 00:24:39 +01:00
Marko Djordjevic
b4956f3fb9 fix: call init_db() on ML service startup to create training_runs table 2026-02-18 00:08:23 +01:00
Marko Djordjevic
6ef102cf21 fix: training panel stuck button, stale runs on startup, add delete model 2026-02-17 22:23:50 +01:00
Marko Djordjevic
2a02669222 feat: add FastAPI model/load endpoint and all Next.js proxy routes (tasks 2-4) 2026-02-17 18:47:04 +01:00
Marko Djordjevic
b8e649e333 feat: add FastAPI pattern detection endpoints (Section 1)
- Extract CDL pattern detection logic into services/ml/app/patterns.py
  with TALIB_PATTERNS dict, get_available_patterns(), validate_pattern_names(),
  and detect_patterns(candles, pattern_names) functions
- Add GET /patterns/available endpoint returning all 54 supported CDL pattern
  names with display names
- Add POST /patterns/detect endpoint accepting {candles, patterns}, running
  selected CDL functions, returning span annotations with source "talib"
- Add input validation: reject invalid pattern names with HTTP 400,
  treat empty patterns list as "run all"

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 18:34:14 +01:00
Marko Djordjevic
f7b154f866 fix(ml): extract class labels from model when metadata is missing
The model info returned empty labels array because the pkl file has
no metadata dict. Now extracts labels from model.classes_ or
model.model.classes_ as fallback.
2026-02-15 22:18:16 +01:00
Marko Djordjevic
be09cef098 fix(ml): add missing numpy import in main.py 2026-02-15 22:08:41 +01:00
Marko Djordjevic
40d6d1739e fix(ml): add windowed feature flattening for inference parity
The model was trained on 94-candle sliding windows flattened to 2820
features (94 candles x 30 features). Inference was sending raw per-candle
features (27 columns).

Changes:
- Rewrite preprocessing to return (X, window_times) tuple
- Add sliding window creation with correct feature ordering
- Fill missing columns (average, barCount) with 0 for feature parity
- Fill NaN from indicator warmup with 0 instead of dropping rows
- Always compute all indicators (including MFI) for feature parity
- Update predict and batch predict endpoints for new signature
2026-02-15 22:07:06 +01:00
Marko Djordjevic
f850728d44 fix(api): add GET /api/charts/[id] and fix batch prediction
- Add GET handler to /api/charts/[id] route to fetch chart metadata
- Fix batch prediction to use regular /predict endpoint with database candles
- Remove /predict/batch usage (was designed for file-based predictions)
- Make volume field optional in CandleData model (database candles don't have volume)
- Convert timestamps to ISO dates for batch requests

Known issue: TA-Lib indicators failing with 'input array type is not double'
- May need to ensure candle data is float64/double type before processing
2026-02-15 21:49:22 +01:00
Marko Djordjevic
6d0d67e39b fix(ml): make pair and timeframe optional in PredictRequest
- Change pair and timeframe fields from required to optional
- Frontend only sends candles array, not pair/timeframe metadata
- These fields are only used for logging, not prediction logic
- Update logging to handle None values with 'unknown' fallback
- Fixes 422 validation error on /predict endpoint
2026-02-15 21:43:14 +01:00
Marko Djordjevic
3a83fd38e9 feat(ml): implement FastAPI inference service with model loading, preprocessing, and prediction endpoints 2026-02-15 14:29:07 +01:00