## Summary The sandbox proxy's `inference.local` interception path fully buffers upstream streaming responses (SSE) before sending any bytes back to the client. This converts a streaming response (fast TTFB, incremental tokens) into a buffered response (slow TTFB, instant completion), causing downstream clients with TTFB timeouts to abort before the proxy finishes buffering. ## Actual Behavior 1. Client sends `POST /v1/chat/completions` with `"stream": true` to `inference.local` 2. Proxy forwards to upstream (e.g. NVIDIA NIM endpoint), which begins streaming SSE events (TTFB ~200ms) 3. `response.bytes().await` in `proxy_to_backend()` blocks for the **full generation time** (10-60s) 4. Proxy sends nothing back to the client during this entire period 5. Client's TTFB/first-event timeout fires and aborts the request 6. Proxy eventually finishes buffering and tries to write to a closed connection ## Expected Behavior The proxy should forward response headers and SSE chunks to the client incrementally as they arrive from the upstream, preserving the streaming semantics and sub-second TTFB. ## Root Cause Three layers conspire to prevent streaming: 1. **`navigator-router/src/backend.rs:112-115`** — `response.bytes().await` collects the entire upstream response body into memory before returning 2. **`ProxyResponse` struct (`backend.rs:9-13`)** — stores body as a single `bytes::Bytes` blob with no streaming abstraction 3. **`format_http_response()` (`l7/inference.rs:244-246`)** — replaces upstream `Transfer-Encoding: chunked` with `Content-Length` based on buffered body size, then writes everything in a single `write_all` call ## Affected Path Only the HTTPS/CONNECT inference interception path (`inference.local`) is affected. The plain HTTP forward proxy path uses `copy_bidirectional` and streams correctly. Call chain: `handle_inference_interception()` → `route_inference_request()` → `proxy_with_candidates()` → `proxy_to_backend()` → `response.bytes().await` ## Fix Direction - Add a streaming variant of `proxy_to_backend()` that returns headers + `impl Stream<Item = Bytes>` instead of a fully-buffered `ProxyResponse` - Have `route_inference_request()` write response headers immediately, then forward body chunks incrementally to the TLS client - For streaming responses, emit `Transfer-Encoding: chunked` instead of `Content-Length`