Commit graph

73 commits

Author SHA1 Message Date
Marko Djordjevic
e3184ea86c save mlruns dir to git 2026-02-15 22:26:01 +01:00
Marko Djordjevic
f7b154f866 fix(ml): extract class labels from model when metadata is missing
The model info returned empty labels array because the pkl file has
no metadata dict. Now extracts labels from model.classes_ or
model.model.classes_ as fallback.
2026-02-15 22:18:16 +01:00
Marko Djordjevic
be09cef098 fix(ml): add missing numpy import in main.py 2026-02-15 22:08:41 +01:00
Marko Djordjevic
40d6d1739e fix(ml): add windowed feature flattening for inference parity
The model was trained on 94-candle sliding windows flattened to 2820
features (94 candles x 30 features). Inference was sending raw per-candle
features (27 columns).

Changes:
- Rewrite preprocessing to return (X, window_times) tuple
- Add sliding window creation with correct feature ordering
- Fill missing columns (average, barCount) with 0 for feature parity
- Fill NaN from indicator warmup with 0 instead of dropping rows
- Always compute all indicators (including MFI) for feature parity
- Update predict and batch predict endpoints for new signature
2026-02-15 22:07:06 +01:00
Marko Djordjevic
4c7b3f2676 fix(ml): detect all-NaN volume column (not just missing column)
The Pydantic model sets volume=None when absent, creating an all-NaN
column rather than a missing column. Check isna().all() in addition
to column existence.
2026-02-15 21:58:16 +01:00
Marko Djordjevic
317f925c43 fix(ml): handle missing volume data and skip volume-dependent indicators
- Fill volume with 0 when column is absent from candle data
- Skip MFI/OBV/AD/ADOSC indicators when no real volume data available
- Fix pandas FutureWarning for inplace fillna in candle_features
- Remove temporary debug NaN logging
2026-02-15 21:56:14 +01:00
Marko Djordjevic
b6b37160a7 fix(ml): cast OHLCV arrays to float64 for TA-Lib compatibility 2026-02-15 21:52:45 +01:00
Marko Djordjevic
f850728d44 fix(api): add GET /api/charts/[id] and fix batch prediction
- Add GET handler to /api/charts/[id] route to fetch chart metadata
- Fix batch prediction to use regular /predict endpoint with database candles
- Remove /predict/batch usage (was designed for file-based predictions)
- Make volume field optional in CandleData model (database candles don't have volume)
- Convert timestamps to ISO dates for batch requests

Known issue: TA-Lib indicators failing with 'input array type is not double'
- May need to ensure candle data is float64/double type before processing
2026-02-15 21:49:22 +01:00
Marko Djordjevic
6d0d67e39b fix(ml): make pair and timeframe optional in PredictRequest
- Change pair and timeframe fields from required to optional
- Frontend only sends candles array, not pair/timeframe metadata
- These fields are only used for logging, not prediction logic
- Update logging to handle None values with 'unknown' fallback
- Fixes 422 validation error on /predict endpoint
2026-02-15 21:43:14 +01:00
Marko Djordjevic
5a7c901980 fix(frontend): update ModelInfoResponse types to match backend structure
- Update TypeScript types to match flat backend response structure
- Remove nested model_info and metrics objects
- Remove label_config, use labels array and per_class_metrics array
- Update all component references to use new structure
- Generate default colors for prediction labels in CandleChart
- Fix TypeScript type errors for nullable model_version
- Remove accuracy/F1 metrics display (not in new response)
2026-02-15 21:39:38 +01:00
Marko Djordjevic
aa81d4f3d0 fix(ml): complete ML pipeline fixes and setup
- Fix CCI indicator to use HLC prices instead of close only
- Parse datetime column when loading enriched CSV
- Strip timezone from annotation timestamps
- Fix TA-Lib pattern names (CDL3WHITESOLDIERS, CDL3BLACKCROWS)
- Exclude programmatic label columns from training features
- Fix classification report to handle missing classes
- Update MLflow tracking to use localhost:5000
- Grant PostgreSQL permissions to ml_user

Pipeline now runs successfully end-to-end:
- Feature engineering: 2543 rows, 31 columns
- Annotation ingestion: 286 samples
- Training: 89.47% test accuracy with Random Forest
2026-02-15 21:29:54 +01:00
Marko Djordjevic
ceb4103ec4 fix(ml): parse datetime column and fix TA-Lib pattern names
- Add parse_dates parameter when loading enriched CSV
- Strip timezone from annotation timestamps to match data
- Fix pattern names: CDLTHREEWHITESOLDIERS -> CDL3WHITESOLDIERS
- Fix pattern names: CDLTHREEBLACKCROWS -> CDL3BLACKCROWS
2026-02-15 21:13:20 +01:00
Marko Djordjevic
2b86524436 fix(ml): correct CCI indicator signature to use HLC prices 2026-02-15 21:09:47 +01:00
Marko Djordjevic
63486bc7b5 fix(ml): add CCI to hlc_indicators list
CCI (Commodity Channel Index) requires high, low, and close prices
2026-02-15 21:08:20 +01:00
Marko Djordjevic
57240d4eea fix(scripts): add created_at timestamps to annotation import
Set created_at field for both span_label_types and span_annotations
to satisfy NOT NULL constraint
2026-02-15 19:36:55 +01:00
Marko Djordjevic
a68a681c9b fix(ml): handle date strings in TA-Lib annotation generator
- Convert date strings to Unix timestamps in load_ohlcv()
- Fix duplicate pattern names (CDL3WHITESOLDIERS/CDL3BLACKCROWS)
- Ensure time column is always integer type
2026-02-15 19:30:38 +01:00
Marko Djordjevic
847ff67986 feat(ml): add TA-Lib annotation generation and import workflow
Add complete workflow for using TA-Lib to bootstrap training data:

- generate_talib_annotations.py: Python script to run TA-Lib CDL* functions
  and output span annotations in UI-compatible format
- import_talib_annotations.ts: TypeScript script to import generated
  annotations into the UI database with auto-label-type creation
- npm script 'import-annotations' for easy execution
- TALIB_WORKFLOW.md: Comprehensive guide covering the full cycle:
  * Generate patterns with TA-Lib
  * Import into UI
  * Review and edit in browser
  * Export and train model
  * Compare predictions with TA-Lib detections
  * Iterate for improvement

This enables the intended workflow: use TA-Lib for initial annotations,
manually refine them, then train a model that learns from corrections.
2026-02-15 19:18:28 +01:00
Marko Djordjevic
3a83fd38e9 feat(ml): implement FastAPI inference service with model loading, preprocessing, and prediction endpoints 2026-02-15 14:29:07 +01:00
Marko Djordjevic
f4c0f9a836 feat(ml): implement training stage with MLflow tracking and model wrappers
- Create RandomForestModel and XGBoostModel wrappers with class weight support
- Implement temporal and random train/val/test splitting
- Add MLflow experiment tracking with full parameter and metric logging
- Create evaluation module for confusion matrix, feature importance, and classification reports
- Implement model training with sklearn/xgboost flavor logging and optional registry registration
- Store training run metadata in PostgreSQL
- Wire training stage into pipeline.py orchestrator
- Support both RandomForest and XGBoost models with configurable hyperparameters
2026-02-15 14:22:19 +01:00
Marko Djordjevic
16763b967e feat(ml): implement annotation ingestion with windowed/BIO encoding and TA-Lib patterns 2026-02-15 12:28:58 +01:00
Marko Djordjevic
fd29ab91e0 feat(ml): implement feature engineering pipeline
- Create pipeline.py with CLI argument parsing for running stages
- Implement TA-Lib indicator computation with multi-output support
- Add candle feature extraction (body_size, wicks, ratios, etc.)
- Create custom feature loader with dynamic module import
- Wire all feature engineering stages with NaN handling
- Tasks completed: 2.2, 2.3, 3.1, 3.2, 3.3, 3.4, 3.5
2026-02-15 12:22:59 +01:00
Marko Djordjevic
ea339a54a7 feat(ml): add database schema, config parser, and DVC setup
- Initialize DVC with local storage backend (task 1.6)
- Create PostgreSQL schema for training_runs table (task 1.7)
- Add SQLAlchemy database connection setup (task 1.8)
- Create Pydantic config models for pipeline.yaml (task 2.1)
- Add migration runner for database setup
- Fix pyproject.toml package discovery config
2026-02-15 12:08:53 +01:00
Marko Djordjevic
1a653c5866 feat: add ML service scaffolding with Python FastAPI, Docker, and MLflow setup 2026-02-15 11:58:31 +01:00