candle-annotator/openspec/specs/data-ingestion/spec.md

42 lines
2.5 KiB
Markdown

## ADDED Requirements
### Requirement: CSV file upload
The system SHALL provide a file upload component that accepts CSV files containing OHLC candle data. The CSV format MUST have columns: `time`, `open`, `high`, `low`, `close`. The `time` column SHALL accept both `YYYY-MM-DD` date strings and Unix timestamps (integer seconds).
#### Scenario: Valid CSV upload
- **WHEN** user uploads a CSV file with valid headers (time, open, high, low, close) and valid data rows
- **THEN** system parses all rows and stores them in the `candles` database table
#### Scenario: CSV with Unix timestamps
- **WHEN** user uploads a CSV where the `time` column contains Unix timestamps (e.g., 1700000000)
- **THEN** system stores the timestamps as integers in the database and renders candles correctly on the chart
#### Scenario: CSV with date strings
- **WHEN** user uploads a CSV where the `time` column contains date strings (e.g., "2024-01-15")
- **THEN** system converts dates to Unix timestamps and stores them in the database
#### Scenario: Invalid CSV format
- **WHEN** user uploads a CSV missing required headers or containing malformed data
- **THEN** system displays an error message describing the issue and does not store any partial data
#### Scenario: Duplicate upload
- **WHEN** user uploads a CSV containing candle times that already exist in the database
- **THEN** system replaces existing candle records with the new data (upsert behavior)
### Requirement: CSV parsing with papaparse
The system SHALL use the `papaparse` library for CSV parsing. Parsing SHALL handle large files by using streaming mode for files exceeding 10,000 rows. Parsed records SHALL be inserted into SQLite within a single database transaction for atomicity.
#### Scenario: Large file parsing
- **WHEN** user uploads a CSV with more than 10,000 rows
- **THEN** system uses streaming parse and batch inserts within a transaction, completing without memory issues
#### Scenario: Transaction atomicity
- **WHEN** a parse error occurs midway through a CSV file
- **THEN** system rolls back the entire transaction and no partial data is stored
### Requirement: Candles database table
The system SHALL store candle data in a `candles` table with columns: `id` (integer primary key, auto-increment), `time` (integer, Unix timestamp, unique), `open` (real), `high` (real), `low` (real), `close` (real). The `time` column MUST have a unique constraint.
#### Scenario: Schema structure
- **WHEN** the database is initialized
- **THEN** the `candles` table exists with all required columns and constraints