feat: add database schema, migrations, and API endpoints for span annotations
- Add span_label_types and span_annotations tables to schema - Seed default span label types (bull_flag, bear_flag, etc.) - Implement CRUD API endpoints for span label types - Implement CRUD API endpoints for span annotations - Add time swap validation in POST endpoint (start_time <= end_time)
This commit is contained in:
parent
8a7eb1fb08
commit
dadf515406
11 changed files with 1131 additions and 0 deletions
163
span-annotation-prompt.md
Normal file
163
span-annotation-prompt.md
Normal file
|
|
@ -0,0 +1,163 @@
|
|||
# Span Annotation Feature for Candlestick Pattern Labeling Tool
|
||||
|
||||
## What is Span Annotation?
|
||||
|
||||
Span annotation means selecting a **range of consecutive candles** on a candlestick chart that together form a recognizable pattern (e.g., bull flag, head and shoulders, double bottom). The user clicks a start candle and an end candle, assigns a pattern label, and optionally adds metadata. This is the standard approach for labeling multi-candle patterns in time series data.
|
||||
|
||||
## User Interaction Flow
|
||||
|
||||
1. User enters **annotation mode** (toggle or hotkey)
|
||||
2. User **clicks a candle** → that candle is highlighted as the **span start**
|
||||
3. User **clicks a second candle** → that becomes the **span end**
|
||||
4. A **label selector** appears (dropdown or palette) with the user's predefined pattern categories
|
||||
5. Optionally, user can add:
|
||||
- **Sub-spans** (e.g., mark the "pole" and "flag" portions within a bull flag)
|
||||
- **Outcome** (win/loss/breakeven, or the price move after the pattern)
|
||||
- **Confidence** (how clear the pattern is, 1-5 scale)
|
||||
- **Free-text notes**
|
||||
6. The annotation is saved and **visually rendered** on the chart as a highlighted region with a label tag
|
||||
7. User can **click an existing annotation** to edit or delete it
|
||||
8. Annotations persist and are exportable
|
||||
|
||||
## Visual Rendering of Annotations
|
||||
|
||||
- Draw a **semi-transparent colored rectangle** behind the candles in the span range (color per label category)
|
||||
- Show the **label name** as a small tag above or below the highlighted region
|
||||
- Sub-spans get a **slightly different shade** or a thin divider line within the main span
|
||||
- Overlapping annotations should be visually distinguishable (offset vertically or use border styles)
|
||||
|
||||
## Annotation Data Model
|
||||
|
||||
Each annotation is a JSON object:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "uuid-v4",
|
||||
"pair": "EURUSD",
|
||||
"timeframe": "1H",
|
||||
"start_time": "2024-03-15T09:00:00Z",
|
||||
"end_time": "2024-03-15T16:00:00Z",
|
||||
"start_index": 142,
|
||||
"end_index": 149,
|
||||
"label": "bull_flag",
|
||||
"sub_spans": [
|
||||
{
|
||||
"label": "pole",
|
||||
"start_time": "2024-03-15T09:00:00Z",
|
||||
"end_time": "2024-03-15T12:00:00Z"
|
||||
},
|
||||
{
|
||||
"label": "consolidation",
|
||||
"start_time": "2024-03-15T12:00:00Z",
|
||||
"end_time": "2024-03-15T16:00:00Z"
|
||||
}
|
||||
],
|
||||
"outcome": "win",
|
||||
"confidence": 4,
|
||||
"notes": "clean breakout on volume",
|
||||
"created_at": "2024-03-16T10:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
## Export Formats for ML Training
|
||||
|
||||
The tool must export annotations in multiple formats to support different model types. All exports should be triggered from a single "Export" button with format selection.
|
||||
|
||||
---
|
||||
|
||||
### Format 1: Windowed Classification (CSV)
|
||||
|
||||
One row per annotation. Used for training classifiers (XGBoost, CNN, LSTM) where each row is a labeled window of OHLC data.
|
||||
|
||||
```csv
|
||||
pair,timeframe,start_time,end_time,label,outcome,confidence,window_length,open_0,high_0,low_0,close_0,volume_0,open_1,high_1,low_1,close_1,volume_1,...
|
||||
EURUSD,1H,2024-03-15T09:00:00Z,2024-03-15T16:00:00Z,bull_flag,win,4,8,1.0921,1.0935,1.0918,1.0933,1200,1.0933,1.0948,1.0930,1.0945,1500,...
|
||||
```
|
||||
|
||||
The OHLCV columns are **flattened**: `open_0` through `close_N` where N is the number of candles in the span. Pad shorter spans with NaN or truncate/resample to a fixed window size (user-configurable, e.g., 20 candles).
|
||||
|
||||
---
|
||||
|
||||
### Format 2: Sequence Labels / BIO Tags (CSV)
|
||||
|
||||
One row per candle across the entire dataset. Used for sequence labeling models (BiLSTM-CRF, Transformer encoder). Uses BIO tagging scheme:
|
||||
- **B-{label}** = first candle of a pattern
|
||||
- **I-{label}** = inside a pattern (continuation)
|
||||
- **O** = outside any pattern (no pattern)
|
||||
|
||||
```csv
|
||||
time,open,high,low,close,volume,bio_tag
|
||||
2024-03-15T08:00:00Z,1.0915,1.0922,1.0910,1.0918,980,O
|
||||
2024-03-15T09:00:00Z,1.0921,1.0935,1.0918,1.0933,1200,B-bull_flag
|
||||
2024-03-15T10:00:00Z,1.0933,1.0948,1.0930,1.0945,1500,I-bull_flag
|
||||
2024-03-15T11:00:00Z,1.0944,1.0950,1.0938,1.0941,1100,I-bull_flag
|
||||
...
|
||||
2024-03-15T16:00:00Z,1.0939,1.0960,1.0937,1.0958,1800,I-bull_flag
|
||||
2024-03-15T17:00:00Z,1.0958,1.0965,1.0950,1.0962,900,O
|
||||
```
|
||||
|
||||
For overlapping annotations, use multi-label columns: `bio_tag_1`, `bio_tag_2`, etc.
|
||||
|
||||
---
|
||||
|
||||
### Format 3: Raw Annotations JSON
|
||||
|
||||
The complete annotation list as-is, for custom pipelines or re-import.
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"pair": "EURUSD",
|
||||
"timeframe": "1H",
|
||||
"export_date": "2024-03-20T12:00:00Z",
|
||||
"total_annotations": 47,
|
||||
"label_counts": {
|
||||
"bull_flag": 12,
|
||||
"head_and_shoulders": 8,
|
||||
"double_bottom": 15,
|
||||
"wedge": 12
|
||||
}
|
||||
},
|
||||
"annotations": [
|
||||
{ ... annotation objects as defined above ... }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
|
||||
## Notes
|
||||
|
||||
Format 2 (BIO tags) is probably the most versatile starting point — it works directly with sequence models and you can always derive Format 1 (windowed) from it by slicing. Format 1 (windowed CSV) is what you'd feed directly into XGBoost or a CNN. If you start with just one export format, go with the raw JSON (Format 3) since you can always transform it into the others with a script.
|
||||
|
||||
Make sure the export includes context candles — e.g., 10-20 candles before and after each pattern span. Models need to see the trend leading into the pattern, not just the pattern itself. You might want a configurable context_padding parameter on export.
|
||||
|
||||
|
||||
## Label Configuration
|
||||
|
||||
The user should be able to define their own pattern categories in a config, e.g.:
|
||||
|
||||
```json
|
||||
{
|
||||
"labels": [
|
||||
{ "name": "bull_flag", "color": "#4CAF50", "hotkey": "1" },
|
||||
{ "name": "bear_flag", "color": "#F44336", "hotkey": "2" },
|
||||
{ "name": "head_and_shoulders", "color": "#FF9800", "hotkey": "3" },
|
||||
{ "name": "double_bottom", "color": "#2196F3", "hotkey": "4" },
|
||||
{ "name": "wedge_up", "color": "#9C27B0", "hotkey": "5" },
|
||||
{ "name": "wedge_down", "color": "#795548", "hotkey": "6" },
|
||||
{ "name": "custom", "color": "#607D8B", "hotkey": "0" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Summary of Requirements
|
||||
|
||||
- Click-to-select span annotation on a TradingView Lightweight Charts candlestick chart
|
||||
- Label assignment via dropdown or hotkey
|
||||
- Optional sub-spans, outcome, confidence, notes
|
||||
- Visual overlay of annotations on the chart
|
||||
- Edit/delete existing annotations
|
||||
- Export to: Windowed CSV, BIO-tagged CSV, Raw JSON, and optionally image crops
|
||||
- User-configurable label categories with colors and hotkeys
|
||||
Loading…
Add table
Add a link
Reference in a new issue