Commit graph

22 commits

Author SHA1 Message Date
Marko Djordjevic
d3dcfcea7d feat: auto-build training dataset from DB annotations before training
- Add build_dataset_from_db() that exports candles from DB, runs feature
  engineering, and ingests span annotations into labeled CSV
- Call it automatically in _run_training_background before training starts
- Add POST /training/build-dataset endpoint for standalone use
- Add Next.js proxy route /api/training/build-dataset
- Update TrainingPanel: remove dataset-missing block on Start Training,
  show informational message that dataset builds automatically
2026-02-18 00:24:39 +01:00
Marko Djordjevic
b4956f3fb9 fix: call init_db() on ML service startup to create training_runs table 2026-02-18 00:08:23 +01:00
Marko Djordjevic
6ef102cf21 fix: training panel stuck button, stale runs on startup, add delete model 2026-02-17 22:23:50 +01:00
Marko Djordjevic
2a02669222 feat: add FastAPI model/load endpoint and all Next.js proxy routes (tasks 2-4) 2026-02-17 18:47:04 +01:00
Marko Djordjevic
b8e649e333 feat: add FastAPI pattern detection endpoints (Section 1)
- Extract CDL pattern detection logic into services/ml/app/patterns.py
  with TALIB_PATTERNS dict, get_available_patterns(), validate_pattern_names(),
  and detect_patterns(candles, pattern_names) functions
- Add GET /patterns/available endpoint returning all 54 supported CDL pattern
  names with display names
- Add POST /patterns/detect endpoint accepting {candles, patterns}, running
  selected CDL functions, returning span annotations with source "talib"
- Add input validation: reject invalid pattern names with HTTP 400,
  treat empty patterns list as "run all"

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 18:34:14 +01:00
Marko Djordjevic
d1557a3846 fix: resolve numpy type conversion issues in ML service data access
- Convert numpy.int64 to Python int before passing to SQLAlchemy queries
- Prevents psycopg2.ProgrammingError: can't adapt type 'numpy.int64'
- Applied to get_candles(), get_span_annotations(), and get_point_annotations()
- All ML service database access tests now passing successfully
2026-02-17 14:10:21 +01:00
Marko Djordjevic
bfe437857b feat: add Python migration script and successfully test SQLite to PostgreSQL data migration
- Created scripts/migrate-sqlite-to-postgres.py as alternative to TypeScript version
- Handles all type conversions: timestamps, booleans, and JSONB fields
- Successfully migrated all 2,836 rows from SQLite to PostgreSQL
- Verified data integrity: all 6 tables migrated correctly
- Charts: 1, Candles: 2,592, Annotations: 4, Span annotations: 223
2026-02-17 14:01:21 +01:00
Marko Djordjevic
a18c6d110a Remove __pycache__ from git tracking 2026-02-15 22:29:35 +01:00
Marko Djordjevic
f7b154f866 fix(ml): extract class labels from model when metadata is missing
The model info returned empty labels array because the pkl file has
no metadata dict. Now extracts labels from model.classes_ or
model.model.classes_ as fallback.
2026-02-15 22:18:16 +01:00
Marko Djordjevic
be09cef098 fix(ml): add missing numpy import in main.py 2026-02-15 22:08:41 +01:00
Marko Djordjevic
40d6d1739e fix(ml): add windowed feature flattening for inference parity
The model was trained on 94-candle sliding windows flattened to 2820
features (94 candles x 30 features). Inference was sending raw per-candle
features (27 columns).

Changes:
- Rewrite preprocessing to return (X, window_times) tuple
- Add sliding window creation with correct feature ordering
- Fill missing columns (average, barCount) with 0 for feature parity
- Fill NaN from indicator warmup with 0 instead of dropping rows
- Always compute all indicators (including MFI) for feature parity
- Update predict and batch predict endpoints for new signature
2026-02-15 22:07:06 +01:00
Marko Djordjevic
4c7b3f2676 fix(ml): detect all-NaN volume column (not just missing column)
The Pydantic model sets volume=None when absent, creating an all-NaN
column rather than a missing column. Check isna().all() in addition
to column existence.
2026-02-15 21:58:16 +01:00
Marko Djordjevic
317f925c43 fix(ml): handle missing volume data and skip volume-dependent indicators
- Fill volume with 0 when column is absent from candle data
- Skip MFI/OBV/AD/ADOSC indicators when no real volume data available
- Fix pandas FutureWarning for inplace fillna in candle_features
- Remove temporary debug NaN logging
2026-02-15 21:56:14 +01:00
Marko Djordjevic
f850728d44 fix(api): add GET /api/charts/[id] and fix batch prediction
- Add GET handler to /api/charts/[id] route to fetch chart metadata
- Fix batch prediction to use regular /predict endpoint with database candles
- Remove /predict/batch usage (was designed for file-based predictions)
- Make volume field optional in CandleData model (database candles don't have volume)
- Convert timestamps to ISO dates for batch requests

Known issue: TA-Lib indicators failing with 'input array type is not double'
- May need to ensure candle data is float64/double type before processing
2026-02-15 21:49:22 +01:00
Marko Djordjevic
6d0d67e39b fix(ml): make pair and timeframe optional in PredictRequest
- Change pair and timeframe fields from required to optional
- Frontend only sends candles array, not pair/timeframe metadata
- These fields are only used for logging, not prediction logic
- Update logging to handle None values with 'unknown' fallback
- Fixes 422 validation error on /predict endpoint
2026-02-15 21:43:14 +01:00
Marko Djordjevic
aa81d4f3d0 fix(ml): complete ML pipeline fixes and setup
- Fix CCI indicator to use HLC prices instead of close only
- Parse datetime column when loading enriched CSV
- Strip timezone from annotation timestamps
- Fix TA-Lib pattern names (CDL3WHITESOLDIERS, CDL3BLACKCROWS)
- Exclude programmatic label columns from training features
- Fix classification report to handle missing classes
- Update MLflow tracking to use localhost:5000
- Grant PostgreSQL permissions to ml_user

Pipeline now runs successfully end-to-end:
- Feature engineering: 2543 rows, 31 columns
- Annotation ingestion: 286 samples
- Training: 89.47% test accuracy with Random Forest
2026-02-15 21:29:54 +01:00
Marko Djordjevic
ceb4103ec4 fix(ml): parse datetime column and fix TA-Lib pattern names
- Add parse_dates parameter when loading enriched CSV
- Strip timezone from annotation timestamps to match data
- Fix pattern names: CDLTHREEWHITESOLDIERS -> CDL3WHITESOLDIERS
- Fix pattern names: CDLTHREEBLACKCROWS -> CDL3BLACKCROWS
2026-02-15 21:13:20 +01:00
Marko Djordjevic
2b86524436 fix(ml): correct CCI indicator signature to use HLC prices 2026-02-15 21:09:47 +01:00
Marko Djordjevic
847ff67986 feat(ml): add TA-Lib annotation generation and import workflow
Add complete workflow for using TA-Lib to bootstrap training data:

- generate_talib_annotations.py: Python script to run TA-Lib CDL* functions
  and output span annotations in UI-compatible format
- import_talib_annotations.ts: TypeScript script to import generated
  annotations into the UI database with auto-label-type creation
- npm script 'import-annotations' for easy execution
- TALIB_WORKFLOW.md: Comprehensive guide covering the full cycle:
  * Generate patterns with TA-Lib
  * Import into UI
  * Review and edit in browser
  * Export and train model
  * Compare predictions with TA-Lib detections
  * Iterate for improvement

This enables the intended workflow: use TA-Lib for initial annotations,
manually refine them, then train a model that learns from corrections.
2026-02-15 19:18:28 +01:00
Marko Djordjevic
3a83fd38e9 feat(ml): implement FastAPI inference service with model loading, preprocessing, and prediction endpoints 2026-02-15 14:29:07 +01:00
Marko Djordjevic
16763b967e feat(ml): implement annotation ingestion with windowed/BIO encoding and TA-Lib patterns 2026-02-15 12:28:58 +01:00
Marko Djordjevic
ea339a54a7 feat(ml): add database schema, config parser, and DVC setup
- Initialize DVC with local storage backend (task 1.6)
- Create PostgreSQL schema for training_runs table (task 1.7)
- Add SQLAlchemy database connection setup (task 1.8)
- Create Pydantic config models for pipeline.yaml (task 2.1)
- Add migration runner for database setup
- Fix pyproject.toml package discovery config
2026-02-15 12:08:53 +01:00