Archive code-review-fix change and sync specs to main

- Synced 14 capability delta specs to main specs - Created 6 new main specs: api-authentication, error-boundary, input-validation, security-headers, shared-types - Updated 8 existing specs with security, validation, and performance requirements - Archived change to openspec/changes/archive/2026-02-20-code-review-fix/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 08:54:59 +01:00 · 2026-02-20 08:54:59 +01:00 · 925e7284e3
commit 925e7284e3
parent adb93a2d2e
32 changed files with 691 additions and 4 deletions
--- a/openspec/specs/ml-inference/spec.md
+++ b/openspec/specs/ml-inference/spec.md
@ -105,3 +105,68 @@ The system SHALL group consecutive candle predictions with the same non-"O" labe
 #### Scenario: Break on label change
 - **WHEN** candle T1 is "bull_flag" and candle T2 is "bear_flag"
 - **THEN** the system creates two separate spans
+
+### Requirement: CORS restricted to explicit origins
+The FastAPI ML service SHALL configure CORS with explicit allowed origins instead of wildcard `*`. The default allowed origins SHALL be `["http://localhost:3000"]`. Additional origins MAY be configured via the `CORS_ORIGINS` environment variable (comma-separated).
+
+#### Scenario: Same-origin request allowed
+- **WHEN** a request from `http://localhost:3000` hits the ML service
+- **THEN** the CORS headers allow the request
+
+#### Scenario: Unknown origin blocked
+- **WHEN** a request from `http://evil.com` hits the ML service
+- **THEN** the CORS headers do not include `Access-Control-Allow-Origin` for that origin
+
+### Requirement: Generic error responses from ML service
+All FastAPI endpoints SHALL return generic error messages for unexpected exceptions. The response SHALL be `{ "detail": "Internal server error" }` with HTTP 500. Full exception details SHALL be logged via `logging.error` with traceback.
+
+#### Scenario: Internal error generic response
+- **WHEN** a predict endpoint throws an unexpected exception
+- **THEN** the client receives `{ "detail": "Internal server error" }` and the traceback is logged server-side
+
+### Requirement: Model file integrity check
+When loading a model via `joblib.load()`, the system SHALL verify the model file's SHA256 hash against a manifest file (`models/checksums.sha256`). If the hash does not match or the manifest entry is missing, the load SHALL fail with an error.
+
+#### Scenario: Valid model file
+- **WHEN** `joblib.load("models/abc123.pkl")` is called and the file's SHA256 matches the manifest
+- **THEN** the model loads successfully
+
+#### Scenario: Tampered model file
+- **WHEN** the model file's SHA256 does not match the manifest entry
+- **THEN** the system refuses to load the model and returns HTTP 500 with `{ "detail": "Model integrity check failed" }`
+
+### Requirement: Thread-safe model reads
+The ML service SHALL use a lock for model reads during prediction, not just model writes. All code that reads the current model reference SHALL acquire the `_model_swap_lock` or use an atomic reference swap pattern.
+
+#### Scenario: Concurrent prediction and model swap
+- **WHEN** a prediction request is in progress and a model load request arrives
+- **THEN** the prediction completes with the old model (or waits), and the new model is loaded atomically
+
+### Requirement: Path traversal prevention on model operations
+The FastAPI endpoints that accept `run_id` for model load and delete SHALL validate the `run_id` format and verify that the resolved file path is within the expected `models/` directory using `Path.resolve()`.
+
+#### Scenario: Valid run_id resolves within models directory
+- **WHEN** POST `/model/load` receives `{ run_id: "abc123" }`
+- **THEN** the path resolves to `models/abc123.pkl` within the models directory
+
+#### Scenario: Path traversal attempt blocked
+- **WHEN** POST `/model/load` receives `{ run_id: "../../etc/passwd" }`
+- **THEN** the endpoint returns HTTP 400 before attempting any file operation
+
+### Requirement: Real health check
+The `GET /health` endpoint SHALL perform actual connectivity checks instead of returning hardcoded status. It SHALL execute `SELECT 1` against PostgreSQL and attempt an MLflow API call.
+
+#### Scenario: All services healthy
+- **WHEN** PostgreSQL responds to `SELECT 1` and MLflow API is reachable
+- **THEN** the health endpoint returns `{ "status": "healthy", "database": "connected", "mlflow": "connected" }`
+
+#### Scenario: Database unreachable
+- **WHEN** PostgreSQL does not respond to `SELECT 1`
+- **THEN** the health endpoint returns `{ "status": "degraded", "database": "disconnected" }` with HTTP 200
+
+### Requirement: Candle time-sort validation
+The `POST /predict` endpoint SHALL validate that input candles are sorted by time in ascending order. If candles are not sorted, the endpoint SHALL sort them before processing.
+
+#### Scenario: Unsorted candles auto-sorted
+- **WHEN** candles are provided in random time order
+- **THEN** the endpoint sorts them by time before feature engineering and prediction