In /predict and /predict/batch endpoints, grab the model reference under _model_swap_lock before running inference. Inference itself runs outside the lock (using a local variable) to avoid blocking model swaps during potentially slow computation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| specs | ||
| .openspec.yaml | ||
| design.md | ||
| proposal.md | ||
| tasks.md | ||