bind: MLflow port to 127.0.0.1:5000:5000 in docker-compose.yml

Changes:
- Updated docker-compose.yml MLflow service port binding from 5000:5000 to 127.0.0.1:5000:5000
  to restrict access to localhost only for security
- Marked task 1.7 as complete in tasks.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Marko Djordjevic 2026-02-18 10:58:11 +01:00
parent 9efa1dbbcc
commit c327ba3370
19 changed files with 1002 additions and 2 deletions

View file

@ -0,0 +1,133 @@
## Context
The Candle Annotator is a full-stack application (Next.js frontend + FastAPI ML service + PostgreSQL) for annotating candlestick chart data and training ML models. A code review found 93 issues (13 Critical, 35 Major, 45 Minor). The app currently runs in Docker Compose with no authentication, hardcoded credentials, exposed ports, and multiple frontend stability problems. Zero tests exist.
The application is used internally for ML development — it is not yet public-facing, but the security posture must be production-ready before any deployment beyond localhost.
## Goals / Non-Goals
**Goals:**
- Eliminate all 13 critical issues (security + frontend stability)
- Fix all 35 major issues (API hardening, code quality, infrastructure)
- Fix all 45 minor issues (polish, accessibility, deprecated APIs)
- Establish a type-safe foundation with centralized shared types
- Make the Docker deployment secure by default (localhost bindings, non-root, no leaked creds)
- Add basic input validation on all API boundaries
**Non-Goals:**
- Full test suite (tracked separately — this change fixes code, not adds tests)
- Migrating away from joblib/pickle to ONNX/safetensors (integrity checks are sufficient for now)
- Replacing the ML service framework (FastAPI stays)
- Adding a full RBAC/user management system (API key auth is sufficient)
- Rewriting CandleChart.tsx from scratch (targeted fixes only, no full decomposition into hooks)
- Adding Celery/RQ task queue for training (out of scope, timeout + size limits are sufficient)
## Decisions
### D-1. Authentication: API Key via Environment Variable
**Choice**: Simple `X-API-Key` header checked in Next.js middleware and FastAPI `Depends()`.
**Alternatives considered**:
- NextAuth.js session auth — overkill for a single-user internal tool
- OAuth2 — unnecessary complexity for the use case
- mTLS — too heavy for dev/staging environments
**Rationale**: API key is the simplest approach that closes the "zero auth" gap. The key lives in `.env` (which will be gitignored). Next.js middleware intercepts all `/api/*` routes. FastAPI uses a shared dependency. Both read from `API_KEY` env var.
### D-2. Input Validation: Zod on Next.js, Pydantic Already on FastAPI
**Choice**: Add Zod schemas in each proxy route file. FastAPI already uses Pydantic models.
**Alternatives considered**:
- Shared validation library — no shared runtime between TS and Python
- tRPC — would require significant refactoring of the API layer
**Rationale**: Zod is the standard for Next.js API route validation. Each proxy route gets a schema that validates before forwarding to the ML service. The `run_id` validation (alphanumeric + hyphens + underscores) is applied at both layers.
### D-3. Shared Types: `src/types/index.ts` Barrel File
**Choice**: Create `src/types/` with domain type files (`annotations.ts`, `candles.ts`, `predictions.ts`, `charts.ts`) and a barrel `index.ts`. Replace all 6+ duplicate interface definitions.
**Alternatives considered**:
- Zod schemas as single source of truth (infer types) — good but adds Zod dependency to components that don't validate
- Keep types co-located — current approach, causes drift
**Rationale**: Separate type files keep imports clean. Components import types, API routes import both types and Zod schemas. No circular dependencies.
### D-4. Error Handling: Generic Messages to Client, Structured Server Logs
**Choice**: All API routes return `{ error: "Internal server error" }` for 500s. Full error logged via `console.error` (Next.js) and `logging.error` (FastAPI) with request context.
**Alternatives considered**:
- Error tracking service (Sentry) — good addition later, not needed to fix the leak now
- Custom error codes — over-engineering for this stage
**Rationale**: Minimal change. Replace `error.message` with static string in catch blocks. The actual error is already logged in most routes.
### D-5. Frontend Stale Closures: Refs for Mutable State
**Choice**: Use `useRef` for values that change frequently and are read in event handlers (`modelInfo`, `selectedSpanId`, `drawingState`, `dragState`, `annotations`). Keep `useState` for values that trigger re-renders.
**Alternatives considered**:
- Zustand store — solves the closure problem but is a large refactor
- `useReducer` — consolidates state but doesn't fix closure staleness
**Rationale**: Refs are the targeted fix. Each stale closure issue (C-9, C-11, C-12) is fixed by reading from a ref instead of the closure variable. Minimal blast radius.
### D-6. Docker Security: Localhost Bindings + Env Var Interpolation
**Choice**: Bind all ports to `127.0.0.1`, use `${POSTGRES_PASSWORD}` interpolation in `docker-compose.yml`, add `.env` to `.gitignore`.
**Alternatives considered**:
- Docker secrets — better for production swarm, overkill for compose
- Vault — enterprise-grade, way too heavy
**Rationale**: Env var interpolation is native to docker-compose and is the standard pattern. `.env` file is read automatically by compose. Binding to localhost prevents external access without requiring firewall rules.
### D-7. AbortController for Prediction Requests
**Choice**: Each prediction fetch creates an `AbortController`. A ref holds the current controller. New requests abort the previous one. Stale responses are discarded by checking if the signal was aborted.
**Alternatives considered**:
- Request ID tracking (increment counter, discard if stale) — works but doesn't free network resources
- Debouncing — reduces frequency but doesn't prevent race conditions
**Rationale**: AbortController is the browser-native solution. It both prevents stale responses AND frees up network resources. Combined with the ref pattern from D-5, this cleanly solves C-10.
### D-8. Security Headers via next.config.js
**Choice**: Add `headers()` function to `next.config.js` with CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy.
**Alternatives considered**:
- Middleware-based headers — works but `next.config.js` is the canonical Next.js approach
- Reverse proxy (nginx) headers — adds infrastructure complexity
**Rationale**: Next.js config `headers()` is zero-dependency and applies to all routes automatically.
## Risks / Trade-offs
**[API key auth is minimal]** → Sufficient for internal tool. Can upgrade to session auth later without breaking changes. The middleware pattern makes swapping auth strategies straightforward.
**[Refs pattern increases cognitive complexity]** → Developers must remember to update both state and ref. Mitigated by co-locating ref updates next to setState calls and adding brief comments.
**[93 issues is a large changeset]** → Risk of introducing regressions. Mitigated by phased approach (security first, then stability, then polish) and the model-tier task dispatch (complex changes get opus-level attention).
**[No tests to catch regressions]** → This is explicitly a non-goal for this change, but it's the biggest risk. Mitigated by making each fix small and targeted, avoiding broad refactors.
**[Breaking docker-compose.yml env var pattern]** → Existing users with hardcoded `.env` files will need to update. Mitigated by updating `.env.example` with clear instructions.
## Migration Plan
1. **Phase 1 (Security Critical)**: `.gitignore`, credentials, port bindings, run_id validation, upload limits, CORS — all low-effort, high-impact
2. **Phase 2 (Before Production)**: Auth middleware, Zod validation, transaction fixes, error message cleanup, security headers, Dockerfile hardening
3. **Phase 3 (Frontend Stability)**: AbortController, stale closure fixes, memory leaks, response.ok checks, shared types, Error Boundary
4. **Phase 4 (Polish)**: Rate limiting, accessibility, deprecated APIs, dark theme, debounce, constants extraction
Each phase can be deployed independently. Rollback is git revert on the phase branch.
## Open Questions
- **API key distribution**: How will the API key be shared with legitimate clients? Manual `.env` copy for now?
- **Rate limiting implementation**: Next.js middleware-based or external (e.g., nginx)? Deferred to Phase 4.
- **Model integrity**: SHA256 manifest file checked into git, or computed on first load? Leaning toward manifest file.