4.5 KiB
ADDED Requirements
Requirement: TA-Lib indicator computation
The system SHALL compute technical indicators from raw OHLCV data using TA-Lib. The pipeline config's stages.feature_engineering.talib_indicators list defines which indicators to compute. Each indicator entry specifies a name (TA-Lib function name) and params (dictionary of function parameters). Computed indicators SHALL be appended as new columns to the output CSV using lowercase naming: {indicator}_{param} (e.g., rsi_14, ema_20, macd, macd_signal, macd_hist, bbands_upper, bbands_middle, bbands_lower).
Scenario: Compute RSI indicator
- WHEN the config includes
{ name: "RSI", params: { timeperiod: 14 } }and feature engineering is enabled - THEN the system computes RSI with period 14 and appends a
rsi_14column to the enriched CSV
Scenario: Compute multi-output indicator
- WHEN the config includes
{ name: "MACD", params: { fastperiod: 12, slowperiod: 26, signalperiod: 9 } } - THEN the system appends
macd,macd_signal, andmacd_histcolumns to the enriched CSV
Scenario: TA-Lib not installed
- WHEN feature engineering is enabled but the TA-Lib C library is not installed on the system
- THEN the system SHALL fail with a clear error message including installation instructions for the user's platform, and SHALL NOT silently skip the stage
Scenario: Feature engineering disabled
- WHEN
stages.feature_engineering.enabledis false - THEN the system SHALL skip indicator computation entirely and pass raw OHLCV data to the next stage
Requirement: Candle feature extraction
When stages.feature_engineering.candle_features is true, the system SHALL compute derived candle features for each row: body_size (abs(close - open)), body_direction (1 if close >= open, else -1), upper_wick (high - max(open, close)), lower_wick (min(open, close) - low), wick_ratio (upper_wick / lower_wick), body_to_range (body_size / (high - low)), gap (open - previous close), and range (high - low).
Scenario: Compute candle features
- WHEN
candle_featuresis true and feature engineering is enabled - THEN the system appends columns
body_size,body_direction,upper_wick,lower_wick,wick_ratio,body_to_range,gap,rangeto the enriched CSV
Scenario: Division by zero handling
- WHEN a candle has
lower_wickequal to 0 (forwick_ratio) orhighequal tolow(forbody_to_range) - THEN the system SHALL set the result to 0.0 instead of raising an error
Scenario: Gap for first candle
- WHEN computing
gapfor the first candle in the dataset (no previous close) - THEN the system SHALL set gap to 0.0
Requirement: Custom feature functions
When stages.feature_engineering.custom_features is configured, the system SHALL dynamically import each listed Python module path and call it as a function. Each custom feature function SHALL accept a pandas DataFrame (the full OHLCV + computed features so far) and return a pandas Series. The returned Series SHALL be appended as a new column named after the function.
Scenario: Load custom feature
- WHEN the config includes
custom_features: ["features.custom.trend_slope"] - THEN the system imports
features.custom.trend_slope, calls it with the DataFrame, and appends the result as atrend_slopecolumn
Scenario: Custom feature import error
- WHEN a custom feature module path cannot be imported
- THEN the system SHALL fail with an error message naming the unresolvable module path
Requirement: NaN handling for warmup periods
After computing all indicators, the system SHALL handle NaN values introduced by indicator warmup periods. Rows with NaN values in indicator columns SHALL be dropped from the output. The system SHALL log how many rows were dropped.
Scenario: Drop warmup rows
- WHEN RSI with period 14 produces NaN for the first 14 rows
- THEN those rows are dropped from the enriched CSV and a log message reports "Dropped 14 rows due to indicator warmup"
Requirement: Enriched CSV output
The system SHALL write the enriched dataset (original OHLCV columns + all computed feature columns) to the path specified by data.enriched_path in CSV format. The output SHALL preserve the original column order with new feature columns appended.
Scenario: Write enriched CSV
- WHEN feature engineering completes successfully
- THEN the enriched CSV is written to
data.enriched_pathwith all original and computed columns