candle-annotator/openspec/changes/archive/2026-02-20-code-review-fix/specs/ml-inference/spec.md at 925e7284e33f1af4d058fe210335c44a2f50a316

Marko Djordjevic 925e7284e3 Archive code-review-fix change and sync specs to main

- Synced 14 capability delta specs to main specs
- Created 6 new main specs: api-authentication, error-boundary, input-validation, security-headers, shared-types
- Updated 8 existing specs with security, validation, and performance requirements
- Archived change to openspec/changes/archive/2026-02-20-code-review-fix/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-20 08:54:59 +01:00

4 KiB

Raw Blame History

ADDED Requirements

Requirement: CORS restricted to explicit origins

The FastAPI ML service SHALL configure CORS with explicit allowed origins instead of wildcard *. The default allowed origins SHALL be ["http://localhost:3000"]. Additional origins MAY be configured via the CORS_ORIGINS environment variable (comma-separated).

Scenario: Same-origin request allowed

WHEN a request from http://localhost:3000 hits the ML service
THEN the CORS headers allow the request

Scenario: Unknown origin blocked

WHEN a request from http://evil.com hits the ML service
THEN the CORS headers do not include Access-Control-Allow-Origin for that origin

Requirement: Generic error responses from ML service

All FastAPI endpoints SHALL return generic error messages for unexpected exceptions. The response SHALL be { "detail": "Internal server error" } with HTTP 500. Full exception details SHALL be logged via logging.error with traceback.

Scenario: Internal error generic response

WHEN a predict endpoint throws an unexpected exception
THEN the client receives { "detail": "Internal server error" } and the traceback is logged server-side

Requirement: Model file integrity check

When loading a model via joblib.load(), the system SHALL verify the model file's SHA256 hash against a manifest file (models/checksums.sha256). If the hash does not match or the manifest entry is missing, the load SHALL fail with an error.

Scenario: Valid model file

WHEN joblib.load("models/abc123.pkl") is called and the file's SHA256 matches the manifest
THEN the model loads successfully

Scenario: Tampered model file

WHEN the model file's SHA256 does not match the manifest entry
THEN the system refuses to load the model and returns HTTP 500 with { "detail": "Model integrity check failed" }

Requirement: Thread-safe model reads

The ML service SHALL use a lock for model reads during prediction, not just model writes. All code that reads the current model reference SHALL acquire the _model_swap_lock or use an atomic reference swap pattern.

Scenario: Concurrent prediction and model swap

WHEN a prediction request is in progress and a model load request arrives
THEN the prediction completes with the old model (or waits), and the new model is loaded atomically

Requirement: Path traversal prevention on model operations

The FastAPI endpoints that accept run_id for model load and delete SHALL validate the run_id format and verify that the resolved file path is within the expected models/ directory using Path.resolve().

Scenario: Valid run_id resolves within models directory

WHEN POST /model/load receives { run_id: "abc123" }
THEN the path resolves to models/abc123.pkl within the models directory

Scenario: Path traversal attempt blocked

WHEN POST /model/load receives { run_id: "../../etc/passwd" }
THEN the endpoint returns HTTP 400 before attempting any file operation

Requirement: Real health check

The GET /health endpoint SHALL perform actual connectivity checks instead of returning hardcoded status. It SHALL execute SELECT 1 against PostgreSQL and attempt an MLflow API call.

Scenario: All services healthy

WHEN PostgreSQL responds to SELECT 1 and MLflow API is reachable
THEN the health endpoint returns { "status": "healthy", "database": "connected", "mlflow": "connected" }

Scenario: Database unreachable

WHEN PostgreSQL does not respond to SELECT 1
THEN the health endpoint returns { "status": "degraded", "database": "disconnected" } with HTTP 200

Requirement: Candle time-sort validation

The POST /predict endpoint SHALL validate that input candles are sorted by time in ascending order. If candles are not sorted, the endpoint SHALL sort them before processing.

Scenario: Unsorted candles auto-sorted

WHEN candles are provided in random time order
THEN the endpoint sorts them by time before feature engineering and prediction

4 KiB Raw Blame History