fix(ml): complete ML pipeline fixes and setup
- Fix CCI indicator to use HLC prices instead of close only - Parse datetime column when loading enriched CSV - Strip timezone from annotation timestamps - Fix TA-Lib pattern names (CDL3WHITESOLDIERS, CDL3BLACKCROWS) - Exclude programmatic label columns from training features - Fix classification report to handle missing classes - Update MLflow tracking to use localhost:5000 - Grant PostgreSQL permissions to ml_user Pipeline now runs successfully end-to-end: - Feature engineering: 2543 rows, 31 columns - Annotation ingestion: 286 samples - Training: 89.47% test accuracy with Random Forest
This commit is contained in:
parent
ceb4103ec4
commit
aa81d4f3d0
348 changed files with 1327 additions and 11 deletions
|
|
@ -208,7 +208,10 @@ def train(
|
|||
raise ValueError("Labeled dataset must have 'label' column")
|
||||
|
||||
label_col = 'label'
|
||||
feature_cols = [col for col in df.columns if col not in ['label', 'time', 'timestamp']]
|
||||
# Exclude label columns, time columns, and programmatic label columns (which contain string values)
|
||||
feature_cols = [col for col in df.columns
|
||||
if col not in ['label', 'time', 'timestamp']
|
||||
and not col.startswith('label_programmatic_')]
|
||||
|
||||
X = df[feature_cols].values
|
||||
y = df[label_col].values
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue