candle-annotator/openspec/specs/feature-engineering/spec.md
Marko Djordjevic 7e0579f65d sync: migrate delta specs to main openspec/specs
- Added 5 new capabilities: feature-engineering, annotation-ingestion, ml-training, ml-inference, prediction-ui
- Updated 2 existing capabilities: backend-api, span-annotation
- All specs synced from openspec/changes/candle-backend/specs/
2026-02-16 11:44:34 +01:00

4.5 KiB

ADDED Requirements

Requirement: TA-Lib indicator computation

The system SHALL compute technical indicators from raw OHLCV data using TA-Lib. The pipeline config's stages.feature_engineering.talib_indicators list defines which indicators to compute. Each indicator entry specifies a name (TA-Lib function name) and params (dictionary of function parameters). Computed indicators SHALL be appended as new columns to the output CSV using lowercase naming: {indicator}_{param} (e.g., rsi_14, ema_20, macd, macd_signal, macd_hist, bbands_upper, bbands_middle, bbands_lower).

Scenario: Compute RSI indicator

  • WHEN the config includes { name: "RSI", params: { timeperiod: 14 } } and feature engineering is enabled
  • THEN the system computes RSI with period 14 and appends a rsi_14 column to the enriched CSV

Scenario: Compute multi-output indicator

  • WHEN the config includes { name: "MACD", params: { fastperiod: 12, slowperiod: 26, signalperiod: 9 } }
  • THEN the system appends macd, macd_signal, and macd_hist columns to the enriched CSV

Scenario: TA-Lib not installed

  • WHEN feature engineering is enabled but the TA-Lib C library is not installed on the system
  • THEN the system SHALL fail with a clear error message including installation instructions for the user's platform, and SHALL NOT silently skip the stage

Scenario: Feature engineering disabled

  • WHEN stages.feature_engineering.enabled is false
  • THEN the system SHALL skip indicator computation entirely and pass raw OHLCV data to the next stage

Requirement: Candle feature extraction

When stages.feature_engineering.candle_features is true, the system SHALL compute derived candle features for each row: body_size (abs(close - open)), body_direction (1 if close >= open, else -1), upper_wick (high - max(open, close)), lower_wick (min(open, close) - low), wick_ratio (upper_wick / lower_wick), body_to_range (body_size / (high - low)), gap (open - previous close), and range (high - low).

Scenario: Compute candle features

  • WHEN candle_features is true and feature engineering is enabled
  • THEN the system appends columns body_size, body_direction, upper_wick, lower_wick, wick_ratio, body_to_range, gap, range to the enriched CSV

Scenario: Division by zero handling

  • WHEN a candle has lower_wick equal to 0 (for wick_ratio) or high equal to low (for body_to_range)
  • THEN the system SHALL set the result to 0.0 instead of raising an error

Scenario: Gap for first candle

  • WHEN computing gap for the first candle in the dataset (no previous close)
  • THEN the system SHALL set gap to 0.0

Requirement: Custom feature functions

When stages.feature_engineering.custom_features is configured, the system SHALL dynamically import each listed Python module path and call it as a function. Each custom feature function SHALL accept a pandas DataFrame (the full OHLCV + computed features so far) and return a pandas Series. The returned Series SHALL be appended as a new column named after the function.

Scenario: Load custom feature

  • WHEN the config includes custom_features: ["features.custom.trend_slope"]
  • THEN the system imports features.custom.trend_slope, calls it with the DataFrame, and appends the result as a trend_slope column

Scenario: Custom feature import error

  • WHEN a custom feature module path cannot be imported
  • THEN the system SHALL fail with an error message naming the unresolvable module path

Requirement: NaN handling for warmup periods

After computing all indicators, the system SHALL handle NaN values introduced by indicator warmup periods. Rows with NaN values in indicator columns SHALL be dropped from the output. The system SHALL log how many rows were dropped.

Scenario: Drop warmup rows

  • WHEN RSI with period 14 produces NaN for the first 14 rows
  • THEN those rows are dropped from the enriched CSV and a log message reports "Dropped 14 rows due to indicator warmup"

Requirement: Enriched CSV output

The system SHALL write the enriched dataset (original OHLCV columns + all computed feature columns) to the path specified by data.enriched_path in CSV format. The output SHALL preserve the original column order with new feature columns appended.

Scenario: Write enriched CSV

  • WHEN feature engineering completes successfully
  • THEN the enriched CSV is written to data.enriched_path with all original and computed columns