Create system user appuser with useradd, set ownership of /app,
and switch to non-root user before CMD to reduce container attack surface.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add new GET endpoint for retrieving a specific training run by run_id
- Validate run_id format with regex pattern ^[a-zA-Z0-9_-]+$ before DB access
- Return HTTP 400 for invalid run_id format, HTTP 404 for non-existent runs
- Ensure DELETE endpoint validation is correctly placed before any DB access
- Both endpoints now provide consistent security validation
- Mark task 5.8 as completed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Import concurrent.futures for timeout support
- In _run_training_background: check df.memory_usage(deep=True).sum()
after loading the labeled dataset; raise ValueError if > 500MB
- Wrap model.fit() in a ThreadPoolExecutor with a 1800s timeout;
on TimeoutError update DB status to "failed" with message
"Training timed out after 30 minutes" and return early
- Mark task 5.7 as done in openspec/changes/code-review-fix/tasks.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Execute SELECT 1 against PostgreSQL via SQLAlchemy session to verify DB connectivity
- HTTP GET to ${MLFLOW_TRACKING_URI}/health (default: http://localhost:5000/health) to verify MLflow
- Return HTTP 503 if any component is unhealthy, HTTP 200 only when both are healthy
- Values: "healthy" / "unhealthy" for database and mlflow fields
- Add `import requests as http_requests` and `sa_text` imports
Closes task 5.6 in openspec/changes/code-review-fix/tasks.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sort incoming candles by their time field in ascending chronological order
before processing to ensure correct feature engineering regardless of input
order. Also validate that no duplicate timestamps are present, returning
HTTP 400 with details if duplicates are found.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Parse start_date and end_date as datetime objects
- Return HTTP 400 if end_date is before start_date
- Return HTTP 400 if date range exceeds 365 days (1 year)
Closes task 5.4 in code-review-fix tasks.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In /predict and /predict/batch endpoints, grab the model reference under
_model_swap_lock before running inference. Inference itself runs outside
the lock (using a local variable) to avoid blocking model swaps during
potentially slow computation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add verify_model_checksum() that validates model files against a
models/checksums.sha256 manifest before loading. Fails open when
manifest is missing or file not listed (backward compat), raises
HTTP 500 on hash mismatch. Created empty manifest placeholder.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace all instances of `detail=str(e)`, `detail=f"...{exc}"`, and similar
patterns that exposed internal exception messages to HTTP clients in
services/ml/app/main.py. All exception details are now logged server-side
only via logger.error(), while clients receive a generic "Internal server error"
message. Fixes 9 handlers across predict, batch predict, pattern detection,
training start, training runs fetch, training run delete, dataset info,
build dataset, and model load endpoints.
Mark task 5.1 as completed in tasks.md.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Import Header, Depends, Security from fastapi
- Add verify_api_key dependency: reads API_KEY env var, checks X-API-Key
header, raises HTTP 401 if key mismatch; fail-open if env var not set
- Apply Depends(verify_api_key) to all 14 non-health endpoints
- /health endpoint remains unauthenticated for liveness probes
- Mark task 3.2 as complete in tasks.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace hardcoded allow_origins=['*'] with dynamic configuration
- Read CORS_ORIGINS environment variable (comma-separated list)
- Default to 'http://localhost:3000' if CORS_ORIGINS is not set
- Support multiple origins by splitting and stripping whitespace from env var
- Add `import re` to services/ml/app/main.py
- In POST /model/load: validate run_id matches ^[a-zA-Z0-9_-]+$ before DB lookup; use Path.resolve() + directory containment check before loading model artifact
- In DELETE /training/runs/{run_id}: validate run_id matches ^[a-zA-Z0-9_-]+$ before any processing; use Path.resolve() + directory containment check before deleting model artifact
- Both endpoints return HTTP 400 with {"detail": "Invalid run_id format"} on invalid input
- Mark task 2.2 as completed in openspec/changes/code-review-fix/tasks.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove hardcoded SQL comments containing 'ml_user' and 'ml_password'
- Remove fallback default credentials in DATABASE_URL construction
- Add fail-fast validation: raise RuntimeError if DATABASE_URL env var is missing or empty
- Mark task 1.4 as complete in code-review-fix/tasks.md
- Add build_dataset_from_db() that exports candles from DB, runs feature
engineering, and ingests span annotations into labeled CSV
- Call it automatically in _run_training_background before training starts
- Add POST /training/build-dataset endpoint for standalone use
- Add Next.js proxy route /api/training/build-dataset
- Update TrainingPanel: remove dataset-missing block on Start Training,
show informational message that dataset builds automatically
- Convert numpy.int64 to Python int before passing to SQLAlchemy queries
- Prevents psycopg2.ProgrammingError: can't adapt type 'numpy.int64'
- Applied to get_candles(), get_span_annotations(), and get_point_annotations()
- All ML service database access tests now passing successfully
- Created scripts/migrate-sqlite-to-postgres.py as alternative to TypeScript version
- Handles all type conversions: timestamps, booleans, and JSONB fields
- Successfully migrated all 2,836 rows from SQLite to PostgreSQL
- Verified data integrity: all 6 tables migrated correctly
- Charts: 1, Candles: 2,592, Annotations: 4, Span annotations: 223
The model info returned empty labels array because the pkl file has
no metadata dict. Now extracts labels from model.classes_ or
model.model.classes_ as fallback.
The model was trained on 94-candle sliding windows flattened to 2820
features (94 candles x 30 features). Inference was sending raw per-candle
features (27 columns).
Changes:
- Rewrite preprocessing to return (X, window_times) tuple
- Add sliding window creation with correct feature ordering
- Fill missing columns (average, barCount) with 0 for feature parity
- Fill NaN from indicator warmup with 0 instead of dropping rows
- Always compute all indicators (including MFI) for feature parity
- Update predict and batch predict endpoints for new signature
The Pydantic model sets volume=None when absent, creating an all-NaN
column rather than a missing column. Check isna().all() in addition
to column existence.
- Fill volume with 0 when column is absent from candle data
- Skip MFI/OBV/AD/ADOSC indicators when no real volume data available
- Fix pandas FutureWarning for inplace fillna in candle_features
- Remove temporary debug NaN logging
- Add GET handler to /api/charts/[id] route to fetch chart metadata
- Fix batch prediction to use regular /predict endpoint with database candles
- Remove /predict/batch usage (was designed for file-based predictions)
- Make volume field optional in CandleData model (database candles don't have volume)
- Convert timestamps to ISO dates for batch requests
Known issue: TA-Lib indicators failing with 'input array type is not double'
- May need to ensure candle data is float64/double type before processing
- Change pair and timeframe fields from required to optional
- Frontend only sends candles array, not pair/timeframe metadata
- These fields are only used for logging, not prediction logic
- Update logging to handle None values with 'unknown' fallback
- Fixes 422 validation error on /predict endpoint
- Update TypeScript types to match flat backend response structure
- Remove nested model_info and metrics objects
- Remove label_config, use labels array and per_class_metrics array
- Update all component references to use new structure
- Generate default colors for prediction labels in CandleChart
- Fix TypeScript type errors for nullable model_version
- Remove accuracy/F1 metrics display (not in new response)
- Fix CCI indicator to use HLC prices instead of close only
- Parse datetime column when loading enriched CSV
- Strip timezone from annotation timestamps
- Fix TA-Lib pattern names (CDL3WHITESOLDIERS, CDL3BLACKCROWS)
- Exclude programmatic label columns from training features
- Fix classification report to handle missing classes
- Update MLflow tracking to use localhost:5000
- Grant PostgreSQL permissions to ml_user
Pipeline now runs successfully end-to-end:
- Feature engineering: 2543 rows, 31 columns
- Annotation ingestion: 286 samples
- Training: 89.47% test accuracy with Random Forest
- Convert date strings to Unix timestamps in load_ohlcv()
- Fix duplicate pattern names (CDL3WHITESOLDIERS/CDL3BLACKCROWS)
- Ensure time column is always integer type
Add complete workflow for using TA-Lib to bootstrap training data:
- generate_talib_annotations.py: Python script to run TA-Lib CDL* functions
and output span annotations in UI-compatible format
- import_talib_annotations.ts: TypeScript script to import generated
annotations into the UI database with auto-label-type creation
- npm script 'import-annotations' for easy execution
- TALIB_WORKFLOW.md: Comprehensive guide covering the full cycle:
* Generate patterns with TA-Lib
* Import into UI
* Review and edit in browser
* Export and train model
* Compare predictions with TA-Lib detections
* Iterate for improvement
This enables the intended workflow: use TA-Lib for initial annotations,
manually refine them, then train a model that learns from corrections.
- Create RandomForestModel and XGBoostModel wrappers with class weight support
- Implement temporal and random train/val/test splitting
- Add MLflow experiment tracking with full parameter and metric logging
- Create evaluation module for confusion matrix, feature importance, and classification reports
- Implement model training with sklearn/xgboost flavor logging and optional registry registration
- Store training run metadata in PostgreSQL
- Wire training stage into pipeline.py orchestrator
- Support both RandomForest and XGBoost models with configurable hyperparameters