Skip to content

improvement(tables): versioned CSV snapshot cache for table mounts + parallel multipart uploader#5108

Merged
TheodoreSpeaks merged 8 commits into
stagingfrom
improvement/table-snapshot-cache
Jun 17, 2026
Merged

improvement(tables): versioned CSV snapshot cache for table mounts + parallel multipart uploader#5108
TheodoreSpeaks merged 8 commits into
stagingfrom
improvement/table-snapshot-cache

Conversation

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator

Summary

  • Mount Sim tables into code sandboxes by reference via a version-keyed CSV snapshot in object storage, instead of draining the whole table into web-process heap on every run (fixes a real prod OOM on large mounts).
  • Add a monotonic rows_version on user_table_definitions, bumped by a statement-level Postgres trigger on user_table_rows (INSERT/UPDATE/DELETE) — bypass-proof; reorders/edits invalidate the cache. Mirrors the 0224 statement-level trigger pattern (one bump per statement, no per-row contention).
  • Snapshot cache keyed table-snapshots/{workspaceId}/{tableId}/v{rows_version}.csv: headObject hit = no row read; miss = page rows → canonical export-format CSV → upload. Best-effort version recheck + previous-version cleanup. Capped at the 10MB mount ceiling (the table branch had no size guard before).
  • Add a parallel server-side multipart uploader (createMultipartUpload) to the storage layer — bounded-concurrency parts, single-PutObject fast path for small files — and refactor the table-export worker onto it so it no longer buffers the whole file in heap.
  • Gated behind a new table-snapshot-cache feature flag, default OFF. Flag-off keeps the existing inline-CSV path byte-for-byte.

Type of Change

  • Improvement / enhancement

Testing

  • Tested manually: multipart upload round-trips byte-for-byte against real S3; rows_version trigger verified bumping on update (applied locally, since db push doesn't emit trigger DDL).
  • Added focused Vitest: trigger-free unit coverage for the uploader (parts byte-identity, small-file path, abort), snapshot cache (hit/miss/cap/re-key/tenant), export-runner streaming, and the function-execute flag on/off + size-guard + tenant-isolation branches.
  • bun run check:migrations origin/staging ✓ (expand-safe: additive defaulted column + triggers), check:api-validation:strict ✓, lint:check ✓, tsc --noEmit ✓ (0 errors).

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Follow-ups (not in this PR)

  • Set an S3 lifecycle rule on the table-snapshots/ prefix (e.g. 7d) to reap old versions — they're a pure regenerable cache.
  • Once the flag graduates, delete the legacy inline-CSV mount path.

@vercel

vercel Bot commented Jun 17, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 17, 2026 7:02pm

Request Review

@cursor

cursor Bot commented Jun 17, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes table mount and export hot paths, adds DB triggers and presigned URL handoff to sandboxes, but the new mount behavior is feature-flagged off by default and includes focused tests for mounts, E2B fetch, uploader, and export streaming.

Overview
Adds a version-keyed CSV snapshot cache so large Sim tables can mount into code sandboxes by reference (presigned object-storage URL) instead of loading every row into the web process—addressing heap/OOM on big mounts. Rollout is behind the new table-snapshot-cache flag (default off); flag off keeps the existing inline queryRows → CSV path.

Invalidation & storage: migration adds rows_version on table definitions, bumped by statement-level Postgres triggers on row insert/update/delete. Snapshots live at workspace-scoped keys that include row version + schema shape hash; cache hits use headObject, misses stream canonical export CSV via the new multipart uploader, with size caps and mid-scan version re-keying.

Sandbox & API surface: _sandboxFiles / E2B now support type: 'url' mounts—the sandbox curls the URL (paths/URL in env, not shell interpolation); failed fetch aborts before user code runs. Copilot function_execute uses snapshots for tables with ≥500 rows when the flag is on (presigned URL on cloud storage, buffered download on local storage), with mount limit checks and workspace tenant checks unchanged in spirit.

Export path: table export worker and export row paging switch from buffering the whole file / position ordering to streaming createMultipartUpload and (order_key, id) keyset pages so exports, snapshots, and grid order align. Storage layer gains bounded-concurrency multipart (S3/Azure server-side parts, single PutObject for small payloads) with abort/cleanup on cancel or failure.

Reviewed by Cursor Bugbot for commit bf6ab96. Bugbot is set up for automated code reviews on this repo. Configure here.

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment thread apps/sim/lib/table/snapshot-cache.ts
@greptile-apps

greptile-apps Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR eliminates the web-process OOM on large table mounts by introducing a versioned CSV snapshot cache backed by object storage, a parallel multipart uploader, and a statement-level Postgres trigger to keep rows_version in sync. The feature is gated behind a table-snapshot-cache flag (default OFF) so the existing inline-CSV path is untouched until the flag graduates.

  • Snapshot cache (snapshot-cache.ts): reads rows_version, headObject-checks for an existing snapshot, and materializes on a miss via the new createMultipartUpload; a post-materialize version recheck handles the rare mid-scan mutation case by re-keying to the newer version.
  • Parallel multipart uploader (storage-service.ts): streams arbitrary-sized payloads in ≥8 MB parts with bounded concurrency (MULTIPART_MAX_INFLIGHT = 4); payloads smaller than one part take a single-shot PutObject path; both S3 and Azure Blob backends are wired.
  • Sandbox URL mount (e2b.ts + function-execute.ts): large tables are mounted by presigned URL fetched via curl inside the sandbox so the bytes bypass the web process entirely; small tables (< 500 rows) and the local-storage fallback continue to mount via inline content.

Confidence Score: 5/5

Safe to merge behind the feature flag; the existing inline-CSV path is byte-for-byte unchanged, and the new code is well-tested with focused Vitest coverage.

The change ships behind a default-OFF feature flag so no production behaviour changes until the flag is enabled. The multipart uploader correctly handles abort, concurrent parts, and the single-shot fast path. The snapshot cache correctly enforces tenant isolation and size limits. The two known cleanup gaps in the re-key path (orphaned torn snapshot and incorrect prior-version deletion) are acknowledged and mitigated by a planned S3 lifecycle rule. The only new finding is that complete() lacks an aborted guard, which is a minor footgun not reachable by any current caller.

snapshot-cache.ts and storage-service.ts warrant a second look once the flag is enabled in production, particularly around the re-key cleanup path and the missing aborted guard in complete().

Important Files Changed

Filename Overview
apps/sim/lib/table/snapshot-cache.ts New versioned snapshot cache; handles hit/miss/re-key/cleanup. Two known cleanup gaps in the re-key path (addressed in previous threads by a planned lifecycle rule). The fire-and-forget deletePreviousVersion is correct since all errors are caught internally.
apps/sim/lib/uploads/core/storage-service.ts New createMultipartUpload streaming uploader with bounded concurrency. Solid concurrency model; complete() has a minor gap where it doesn't check aborted before proceeding with the single-shot path. All current callers use abort/complete in mutually exclusive branches so no actual misfire.
apps/sim/lib/execution/e2b.ts URL-based sandbox mount via curl with URL/path passed as env vars (no shell interpolation). Failed fetch correctly kills the sandbox before user code runs. Clean refactor of writeSandboxInputs.
apps/sim/lib/copilot/tools/handlers/function-execute.ts Feature-flagged snapshot cache path; tenant isolation check (workspaceId match) occurs before getOrCreateTableSnapshot. Size guards cover both cloud (URL) and local (buffered) paths. Small-table threshold (500 rows) correctly keeps short tables on the inline path.
packages/db/migrations/0240_table_rows_version.sql Additive migration: bigint column with DEFAULT 0 + three statement-level triggers using transition tables. Mirrors the 0224 pattern. Correctly aliases NEW/OLD TABLE to changed_rows so one function body handles insert/update/delete.
apps/sim/lib/table/export-runner.ts Refactored to stream rows via createMultipartUpload instead of buffering. Handle is opened before pagination, abort is called on all error/cancel branches, and complete is only called on success. Cleanup logic correctly distinguishes completed-but-unannounced vs in-flight uploads.
apps/sim/lib/uploads/providers/s3/client.ts New uploadS3Part server-side part uploader; validates ETag presence before returning. Correctly passes through the S3 config.
apps/sim/lib/uploads/providers/blob/client.ts New stageBlobPart / commitBlobBlockList functions factored through shared getBlockBlobClientFor helper. Block IDs derived from part number (deterministic/idempotent). Existing abortMultipartUpload reused for the abort path.
packages/db/schema.ts rowsVersion column added as bigint({ mode: 'number' }) — returns JS number, safe for comparison. Default 0 and notNull match migration DDL.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant FE as function-execute
    participant SC as snapshot-cache
    participant DB as Postgres
    participant S3 as Object Storage
    participant SB as E2B Sandbox

    FE->>DB: isFeatureEnabled('table-snapshot-cache')
    FE->>DB: getTableById(tableId)
    alt "flag ON and rowCount >= 500"
        FE->>SC: getOrCreateTableSnapshot(table)
        SC->>DB: readRowsVersion(tableId)
        SC->>S3: headObject vN.csv
        alt cache HIT
            S3-->>SC: size
        else cache MISS
            SC->>SC: materialize(table, key)
            SC->>DB: selectExportRowPage paginated
            SC->>S3: createMultipartUpload write parts complete
            SC->>DB: readRowsVersion recheck
            alt version unchanged
                SC->>S3: deleteFile vN-1.csv best-effort
            else version advanced
                SC->>S3: headObject or materialize vN+1.csv
                SC->>S3: deleteFile vN.csv best-effort
            end
        end
        SC-->>FE: key size version
        alt hasCloudStorage
            FE->>S3: generatePresignedDownloadUrl key TTL 600s
            S3-->>FE: presignedUrl
            FE->>SB: mount url type path url
            SB->>S3: curl presignedUrl write to disk
        else local storage
            FE->>S3: downloadFile key
            S3-->>FE: buffer
            FE->>SB: mount content buffer
        end
    else flag OFF or small table
        FE->>DB: queryRows inline CSV
        FE->>SB: mount content csv
    end
    SB-->>FE: execution result
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant FE as function-execute
    participant SC as snapshot-cache
    participant DB as Postgres
    participant S3 as Object Storage
    participant SB as E2B Sandbox

    FE->>DB: isFeatureEnabled('table-snapshot-cache')
    FE->>DB: getTableById(tableId)
    alt "flag ON and rowCount >= 500"
        FE->>SC: getOrCreateTableSnapshot(table)
        SC->>DB: readRowsVersion(tableId)
        SC->>S3: headObject vN.csv
        alt cache HIT
            S3-->>SC: size
        else cache MISS
            SC->>SC: materialize(table, key)
            SC->>DB: selectExportRowPage paginated
            SC->>S3: createMultipartUpload write parts complete
            SC->>DB: readRowsVersion recheck
            alt version unchanged
                SC->>S3: deleteFile vN-1.csv best-effort
            else version advanced
                SC->>S3: headObject or materialize vN+1.csv
                SC->>S3: deleteFile vN.csv best-effort
            end
        end
        SC-->>FE: key size version
        alt hasCloudStorage
            FE->>S3: generatePresignedDownloadUrl key TTL 600s
            S3-->>FE: presignedUrl
            FE->>SB: mount url type path url
            SB->>S3: curl presignedUrl write to disk
        else local storage
            FE->>S3: downloadFile key
            S3-->>FE: buffer
            FE->>SB: mount content buffer
        end
    else flag OFF or small table
        FE->>DB: queryRows inline CSV
        FE->>SB: mount content csv
    end
    SB-->>FE: execution result
Loading

Reviews (3): Last reviewed commit: "improvement(tables): mount snapshots by ..." | Re-trigger Greptile

Comment thread apps/sim/lib/table/snapshot-cache.ts
Comment thread apps/sim/lib/table/snapshot-cache.ts Outdated
Comment thread apps/sim/lib/uploads/core/storage-service.ts
const newSize = newHead ? newHead.size : await materialize(table, newKey)
await deleteFile({ key, context: SNAPSHOT_STORAGE_CONTEXT }).catch(() => {})
void deletePreviousVersion(table, after)
return { key: newKey, size: newSize, version: after }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-key rebuild lacks second version check

Medium Severity

When rows_version advances during the first materialize, the code re-reads the version once and may call materialize again for the new key, but it never re-checks rows_version after that second build. If the table mutates again while that rebuild runs, the function can still return the intermediate version’s object even though the database has moved on.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c340659. Configure here.

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment thread apps/sim/lib/copilot/tools/handlers/function-execute.ts
…ct; key snapshot by column shape so schema edits invalidate it
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Addressed bot findings in f2e6225:

  • Cursor HIGH — URL mounts fail API validation: fixed. functionExecuteContract._sandboxFiles now accepts a { type:'url', path, url } variant (discriminated union), so presigned-URL mounts pass parseRequest.
  • Cursor/Greptile Medium — schema edits skip invalidation: fixed. The snapshot key now includes a column-shape fingerprint (v{rows_version}-{shapeHash}.csv), so a rename/add/remove/reorder of a column produces a new key and re-materializes. rows_version alone only covers row mutations.

Deferred (best-effort, reaped by the bucket lifecycle TTL being added):

  • Re-key prev-version cleanup precision (P2) and torn-object-on-second-materialize-throw (P1) — stale objects are short-lived and lifecycle-expired; not worth extra GC logic.
  • dispatchPart abort guard (P2) — defensive only; write/complete/abort are never called concurrently in our single-caller usage.

Comment thread apps/sim/lib/table/snapshot-cache.ts
Comment thread apps/sim/lib/copilot/tools/handlers/function-execute.ts
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Triage of the two latest Cursor findings — both acknowledged, deferring with rationale:

  • Snapshot CSV ignores fractional ordering (snapshot-cache.ts): the snapshot pages via selectExportRowPage ((position, id)), identical to the export worker. Under tables-fractional-ordering that's the same documented 'near-grid, not exact-grid order' tradeoff export already accepts for a bulk dump — row order is immaterial for a CSV mounted into a sandbox for analysis. Keeping snapshot and export on one paging path is intentional; not diverging them here.
  • Row mount limit cliff at the threshold (function-execute.ts): tables below SNAPSHOT_MIN_ROWS use the legacy inline path, which has no size guard — pre-existing behavior (and the flag-off path, which must stay byte-identical). A <500-row table with very wide cells is the only way to hit it. That legacy path is slated for deletion when the flag graduates; hardening it now would change flag-off behavior. Tracking as a follow-up rather than fixing in this PR.

Everything green: Test and Build ✓, lint ✓, types ✓, api-validation ✓, migration backward-compatible ✓. Verified live: snapshot miss→materialize→S3, cache hit reuse, and in-sandbox presigned-URL fetch on the mothership template.

… CSV matches the grid under fractional ordering
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Update on the fractional-ordering finding: fixed in bf6ab96 rather than deferred. selectExportRowPage (shared by export + snapshot) now keyset-paginates on (order_key, id) instead of (position, id), matching the grid's authoritative order under fractional ordering. order_key is present on every row (always assigned on insert, backfill complete) so this is safe regardless of the flag. Export and snapshot CSVs now reflect manual reorders exactly.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 4 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit bf6ab96. Configure here.

if (head) {
logger.info(`[${requestId}] Snapshot hit`, { tableId: table.id, version, size: head.size })
return { key, size: head.size, version }
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale snapshot on cache hit

Medium Severity

getOrCreateTableSnapshot reads rows_version once, then on a storage headObject hit returns that snapshot immediately. If row writes bump the version after that read, an older v{N} object can still exist and be served even though the table is already at v{N+1}.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit bf6ab96. Configure here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, there will always be a little desync between the live state of the table and whatever snapshot we pass to the e2b container. This is acceptable.

Comment thread apps/sim/lib/table/jobs/service.ts
@TheodoreSpeaks TheodoreSpeaks merged commit ea505f0 into staging Jun 17, 2026
16 checks passed
@TheodoreSpeaks TheodoreSpeaks deleted the improvement/table-snapshot-cache branch June 17, 2026 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant