candle-annotator/openspec/changes/candle-backend/specs/feature-engineering/spec.md

60 lines
4.5 KiB
Markdown

## ADDED Requirements
### Requirement: TA-Lib indicator computation
The system SHALL compute technical indicators from raw OHLCV data using TA-Lib. The pipeline config's `stages.feature_engineering.talib_indicators` list defines which indicators to compute. Each indicator entry specifies a `name` (TA-Lib function name) and `params` (dictionary of function parameters). Computed indicators SHALL be appended as new columns to the output CSV using lowercase naming: `{indicator}_{param}` (e.g., `rsi_14`, `ema_20`, `macd`, `macd_signal`, `macd_hist`, `bbands_upper`, `bbands_middle`, `bbands_lower`).
#### Scenario: Compute RSI indicator
- **WHEN** the config includes `{ name: "RSI", params: { timeperiod: 14 } }` and feature engineering is enabled
- **THEN** the system computes RSI with period 14 and appends a `rsi_14` column to the enriched CSV
#### Scenario: Compute multi-output indicator
- **WHEN** the config includes `{ name: "MACD", params: { fastperiod: 12, slowperiod: 26, signalperiod: 9 } }`
- **THEN** the system appends `macd`, `macd_signal`, and `macd_hist` columns to the enriched CSV
#### Scenario: TA-Lib not installed
- **WHEN** feature engineering is enabled but the TA-Lib C library is not installed on the system
- **THEN** the system SHALL fail with a clear error message including installation instructions for the user's platform, and SHALL NOT silently skip the stage
#### Scenario: Feature engineering disabled
- **WHEN** `stages.feature_engineering.enabled` is false
- **THEN** the system SHALL skip indicator computation entirely and pass raw OHLCV data to the next stage
### Requirement: Candle feature extraction
When `stages.feature_engineering.candle_features` is true, the system SHALL compute derived candle features for each row: `body_size` (abs(close - open)), `body_direction` (1 if close >= open, else -1), `upper_wick` (high - max(open, close)), `lower_wick` (min(open, close) - low), `wick_ratio` (upper_wick / lower_wick), `body_to_range` (body_size / (high - low)), `gap` (open - previous close), and `range` (high - low).
#### Scenario: Compute candle features
- **WHEN** `candle_features` is true and feature engineering is enabled
- **THEN** the system appends columns `body_size`, `body_direction`, `upper_wick`, `lower_wick`, `wick_ratio`, `body_to_range`, `gap`, `range` to the enriched CSV
#### Scenario: Division by zero handling
- **WHEN** a candle has `lower_wick` equal to 0 (for `wick_ratio`) or `high` equal to `low` (for `body_to_range`)
- **THEN** the system SHALL set the result to 0.0 instead of raising an error
#### Scenario: Gap for first candle
- **WHEN** computing `gap` for the first candle in the dataset (no previous close)
- **THEN** the system SHALL set gap to 0.0
### Requirement: Custom feature functions
When `stages.feature_engineering.custom_features` is configured, the system SHALL dynamically import each listed Python module path and call it as a function. Each custom feature function SHALL accept a pandas DataFrame (the full OHLCV + computed features so far) and return a pandas Series. The returned Series SHALL be appended as a new column named after the function.
#### Scenario: Load custom feature
- **WHEN** the config includes `custom_features: ["features.custom.trend_slope"]`
- **THEN** the system imports `features.custom.trend_slope`, calls it with the DataFrame, and appends the result as a `trend_slope` column
#### Scenario: Custom feature import error
- **WHEN** a custom feature module path cannot be imported
- **THEN** the system SHALL fail with an error message naming the unresolvable module path
### Requirement: NaN handling for warmup periods
After computing all indicators, the system SHALL handle NaN values introduced by indicator warmup periods. Rows with NaN values in indicator columns SHALL be dropped from the output. The system SHALL log how many rows were dropped.
#### Scenario: Drop warmup rows
- **WHEN** RSI with period 14 produces NaN for the first 14 rows
- **THEN** those rows are dropped from the enriched CSV and a log message reports "Dropped 14 rows due to indicator warmup"
### Requirement: Enriched CSV output
The system SHALL write the enriched dataset (original OHLCV columns + all computed feature columns) to the path specified by `data.enriched_path` in CSV format. The output SHALL preserve the original column order with new feature columns appended.
#### Scenario: Write enriched CSV
- **WHEN** feature engineering completes successfully
- **THEN** the enriched CSV is written to `data.enriched_path` with all original and computed columns