Agent Diagnostic
While running multiple long-lived gator sandboxes against the local docker-dev gateway, the host started failing unrelated local builds with:
sccache: error: Can't assign requested address (os error 49)
Investigation found stale gator sandboxes whose sandbox JWTs had expired. Their supervisors continued retrying the log-push gRPC stream after the gateway returned Unauthenticated. This produced repeated reconnects and a large number of TIME_WAIT sockets to the local gateway (127.0.0.1:18080), which contributed to local ephemeral-port pressure. Stopping/deleting the stale gator sandboxes relieved the pressure.
Relevant code path inspected: crates/openshell-sandbox/src/log_push.rs. The current loop treats every ended log-push stream as reconnectable. A local patch confirmed the intended shape of the fix: classify tonic::Code::Unauthenticated as fatal for log push, stop the log-push task after that auth failure, and apply backoff before retrying non-auth stream failures.
Validation run against the local patch:
cargo fmt --check
cargo check -p openshell-sandbox --quiet
cargo test -p openshell-sandbox --quiet
Description
Actual behavior: When a sandbox log-push stream fails with Unauthenticated, the sandbox keeps reconnecting indefinitely. Expired sandbox credentials are not recoverable inside the existing sandbox process, so retries continue to fail and can create avoidable gateway/socket load.
Expected behavior: Log push should stop permanently after an authentication failure such as expired sandbox JWT. Other transient stream failures should retry with backoff.
Reproduction Steps
- Run a sandbox long enough for its sandbox JWT/log-push credential to expire, or keep stale sandboxes alive after their gateway-issued auth is no longer valid.
- Observe sandbox stderr or gateway behavior when
PushSandboxLogs returns Unauthenticated.
- Observe repeated reconnect attempts instead of a fatal stop for log push.
- With enough stale sandboxes, observe local gateway connection churn and many
TIME_WAIT sockets.
Environment
- OS: macOS local development host
- Gateway: local
docker-dev gateway at http://127.0.0.1:18080
- OpenShell: current
feat/gator-gate-skill branch during gator sandbox testing
Logs
openshell: log push RPC failed: status: Unauthenticated, message: ...
openshell: log push stream lost, reconnecting...
Observed host-side symptom during the reconnect storm:
sccache: error: Can't assign requested address (os error 49)
Agent Diagnostic
While running multiple long-lived gator sandboxes against the local
docker-devgateway, the host started failing unrelated local builds with:Investigation found stale gator sandboxes whose sandbox JWTs had expired. Their supervisors continued retrying the log-push gRPC stream after the gateway returned
Unauthenticated. This produced repeated reconnects and a large number ofTIME_WAITsockets to the local gateway (127.0.0.1:18080), which contributed to local ephemeral-port pressure. Stopping/deleting the stale gator sandboxes relieved the pressure.Relevant code path inspected:
crates/openshell-sandbox/src/log_push.rs. The current loop treats every ended log-push stream as reconnectable. A local patch confirmed the intended shape of the fix: classifytonic::Code::Unauthenticatedas fatal for log push, stop the log-push task after that auth failure, and apply backoff before retrying non-auth stream failures.Validation run against the local patch:
Description
Actual behavior: When a sandbox log-push stream fails with
Unauthenticated, the sandbox keeps reconnecting indefinitely. Expired sandbox credentials are not recoverable inside the existing sandbox process, so retries continue to fail and can create avoidable gateway/socket load.Expected behavior: Log push should stop permanently after an authentication failure such as expired sandbox JWT. Other transient stream failures should retry with backoff.
Reproduction Steps
PushSandboxLogsreturnsUnauthenticated.TIME_WAITsockets.Environment
docker-devgateway athttp://127.0.0.1:18080feat/gator-gate-skillbranch during gator sandbox testingLogs
Observed host-side symptom during the reconnect storm: