candle-annotator/openspec/changes/candle-backend/specs/feature-engineering/spec.md at 1a653c58663693c748fccf61f45b776b7ad83561

Marko Djordjevic 1a653c5866 feat: add ML service scaffolding with Python FastAPI, Docker, and MLflow setup

2026-02-15 11:58:31 +01:00

4.5 KiB

Raw Blame History

ADDED Requirements

Requirement: TA-Lib indicator computation

The system SHALL compute technical indicators from raw OHLCV data using TA-Lib. The pipeline config's stages.feature_engineering.talib_indicators list defines which indicators to compute. Each indicator entry specifies a name (TA-Lib function name) and params (dictionary of function parameters). Computed indicators SHALL be appended as new columns to the output CSV using lowercase naming: {indicator}_{param} (e.g., rsi_14, ema_20, macd, macd_signal, macd_hist, bbands_upper, bbands_middle, bbands_lower).

Scenario: Compute RSI indicator

WHEN the config includes { name: "RSI", params: { timeperiod: 14 } } and feature engineering is enabled
THEN the system computes RSI with period 14 and appends a rsi_14 column to the enriched CSV

Scenario: Compute multi-output indicator

WHEN the config includes { name: "MACD", params: { fastperiod: 12, slowperiod: 26, signalperiod: 9 } }
THEN the system appends macd, macd_signal, and macd_hist columns to the enriched CSV

Scenario: TA-Lib not installed

WHEN feature engineering is enabled but the TA-Lib C library is not installed on the system
THEN the system SHALL fail with a clear error message including installation instructions for the user's platform, and SHALL NOT silently skip the stage

Scenario: Feature engineering disabled

WHEN stages.feature_engineering.enabled is false
THEN the system SHALL skip indicator computation entirely and pass raw OHLCV data to the next stage

Requirement: Candle feature extraction

When stages.feature_engineering.candle_features is true, the system SHALL compute derived candle features for each row: body_size (abs(close - open)), body_direction (1 if close >= open, else -1), upper_wick (high - max(open, close)), lower_wick (min(open, close) - low), wick_ratio (upper_wick / lower_wick), body_to_range (body_size / (high - low)), gap (open - previous close), and range (high - low).

Scenario: Compute candle features

WHEN candle_features is true and feature engineering is enabled
THEN the system appends columns body_size, body_direction, upper_wick, lower_wick, wick_ratio, body_to_range, gap, range to the enriched CSV

Scenario: Division by zero handling

WHEN a candle has lower_wick equal to 0 (for wick_ratio) or high equal to low (for body_to_range)
THEN the system SHALL set the result to 0.0 instead of raising an error

Scenario: Gap for first candle

WHEN computing gap for the first candle in the dataset (no previous close)
THEN the system SHALL set gap to 0.0

Requirement: Custom feature functions

When stages.feature_engineering.custom_features is configured, the system SHALL dynamically import each listed Python module path and call it as a function. Each custom feature function SHALL accept a pandas DataFrame (the full OHLCV + computed features so far) and return a pandas Series. The returned Series SHALL be appended as a new column named after the function.

Scenario: Load custom feature

WHEN the config includes custom_features: ["features.custom.trend_slope"]
THEN the system imports features.custom.trend_slope, calls it with the DataFrame, and appends the result as a trend_slope column

Scenario: Custom feature import error

WHEN a custom feature module path cannot be imported
THEN the system SHALL fail with an error message naming the unresolvable module path

Requirement: NaN handling for warmup periods

After computing all indicators, the system SHALL handle NaN values introduced by indicator warmup periods. Rows with NaN values in indicator columns SHALL be dropped from the output. The system SHALL log how many rows were dropped.

Scenario: Drop warmup rows

WHEN RSI with period 14 produces NaN for the first 14 rows
THEN those rows are dropped from the enriched CSV and a log message reports "Dropped 14 rows due to indicator warmup"

Requirement: Enriched CSV output

The system SHALL write the enriched dataset (original OHLCV columns + all computed feature columns) to the path specified by data.enriched_path in CSV format. The output SHALL preserve the original column order with new feature columns appended.

Scenario: Write enriched CSV

WHEN feature engineering completes successfully
THEN the enriched CSV is written to data.enriched_path with all original and computed columns

4.5 KiB Raw Blame History