improvement(mothership): user_table speed parity — limit bounds, background import/delete/update jobs by TheodoreSpeaks · Pull Request #5012 · simstudioai/sim

TheodoreSpeaks · 2026-06-12T23:28:27Z

Context

Mothership's user_table tool lagged the fast/safe paths the tables feature already has. This PR lands three of the four findings from the audit; the fourth (the function_execute large-table CSV mount) is deferred — see note at the bottom.

Companion: https://github.com/simstudioai/copilot/pull/312

Changes

C — limit bounds. query_rows took an unbounded limit (whole table into one tool result) and the filter ops took unbounded limits (every matching row's id+data into memory). Now rejected above MAX_QUERY_LIMIT (1000) / MAX_BULK_OPERATION_SIZE (1000) with the contract's message text.

D — query_rows waste + paging. Dropped the always-on executions-metadata load (withExecutions: false), and added keyset pagination: accepts an after cursor, returns nextCursor when a default-order page fills (rejects after+sort). Offset paging was O(n²) walking a big table.

E — async job parity for imports/deletes. import_file / create_from_file (CSV/TSV ≥8MB) and delete_rows_by_filter (>1000 matches, no explicit limit) now dispatch the same trigger.dev jobs the UI routes use, claiming the per-table job slot so they can't race a running background job. Smaller imports stay inline but claim/release the slot. The model polls progress via query_rows (rows appear as an import lands; the delete mask makes query_rows reflect the post-delete view immediately) — no dedicated job-status op. TableImportPayload.deleteSourceFile flag (default true) so Mothership imports of persistent workspace files don't delete the source.

Tests

user-table.test.ts (limit clamps, keyset after/nextCursor, after+sort rejection, slot claim/release, async dispatch payloads) and import-runner.test.ts (source-file cleanup default vs deleteSourceFile: false). bun run lint:check, type-check, and check:api-validation green.

Deferred (not in this PR)

function_execute large-table mount (finding B): the keyset CSV drain fixed the 100-row truncation but materialized the whole table in the web-container heap before the 50MB check, risking OOM on large/enterprise tables. Backed out here (mount stays at main's 100-row behavior); proper fix — incremental byte-bound + filter/limit/columns selection, or by-reference signed-URL fetch — is a separate effort.

Model-facing docs

Catalog change documenting after / limit maxes / background behavior is in simstudioai/copilot#312.

🤖 Generated with Claude Code

TheodoreSpeaks · 2026-06-12T23:28:29Z

@greptile review

vercel · 2026-06-12T23:28:34Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		Jun 17, 2026 10:39pm

cursor · 2026-06-12T23:28:39Z

PR Summary

Medium Risk
Changes bulk row mutation, job ownership, and read masking on large tables; mistakes could cause partial updates, stuck jobs, or inconsistent mid-job reads, though patterns mirror existing import/delete workers.

Overview
Aligns the copilot user_table tool with the tables product’s safe scale paths: bounded reads, exclusive write jobs, and the same async workers the UI uses.

Reads and limits. query_rows now caps page size at MAX_QUERY_LIMIT, skips execution metadata (withExecutions: false), and validates optional limits. Copilot tool schemas note that limit can be omitted on filter update/delete to affect every match (with internal escalation when matches exceed the inline cap).

Imports. Large CSV/TSV (≥ CSV_ASYNC_IMPORT_THRESHOLD_BYTES) routes to runTableImport via Trigger.dev or a detached worker; small imports stay inline but claim/release the table’s single write-job slot. deleteSourceFile: false on import payloads keeps persistent workspace files when Mothership imports by reference. Failed dispatch releases the claim so tables don’t stay stuck in running.

Deletes and updates. delete_rows_by_filter and update_rows_by_filter count matches when needed and escalate above MAX_BULK_OPERATION_SIZE to background jobs (runTableDelete / new runTableUpdate), including explicit limits above the cap via maxRows. Unbounded deletes still use the read mask; bounded deletes skip masking to avoid over-hiding rows. Updates touching unique columns stay inline. Inline deletes also claim the job slot.

New update pipeline. Adds update-runner, paginated selectRowDataPage / updatePageByIds, table-update Trigger.dev task, update job events/types, and delete-runner maxRows paging. Background updates are best-effort (no per-row workflow/enrichment recompute).

Tests cover dispatch, slot contention, clamping, and runner behavior.

^{Reviewed by Cursor Bugbot for commit b551d90. Bugbot is set up for automated code reviews on this repo. Configure here.}

greptile-apps · 2026-06-12T23:36:06Z

Greptile Summary

This PR improves the user_table Mothership tool to reach parity with the UI table feature: it adds a 1000-row cap (clamping for reads, background dispatch for bulk writes), drops the always-on executions metadata load from query_rows, and brings import_file / create_from_file / delete_rows_by_filter into the same background-job infrastructure as the UI routes. A new update_rows_by_filter background path is added alongside the existing inline one, and a table-update Trigger.dev task wraps the new runTableUpdate runner.

Limit bounds (C): query_rows now silently clamps any limit to MAX_QUERY_LIMIT (1000); bulk ops above MAX_BULK_OPERATION_SIZE (1000) escalate to background workers instead of rejecting.
Async parity (E): Large CSV imports (≥8 MB) and unbounded deletes dispatch to the same table-import / table-delete trigger.dev tasks the UI uses, with per-table job-slot ownership enforced via markTableJobRunning; deleteSourceFile: false prevents Mothership imports from deleting persistent workspace files.
Update runner (new): runTableUpdate and tableUpdateTask add keyset-paged background bulk-update support with idempotent JSONB-merge semantics and per-page ownership gating.

Confidence Score: 3/5

Two issues from prior reviews remain open in user-table.ts and affect real model-facing behavior: the query_rows keyset cursor is parsed but never forwarded to the underlying query (models calling with after will loop on the same first page indefinitely), and the inline update_rows_by_filter path skips the write-job slot claim (allowing it to interleave with a concurrent background delete or import).

The new runners (update-runner.ts), background dispatch helpers, and import-runner deleteSourceFile flag are well-implemented and tested. The unresolved after cursor drop means query_rows paging is silently broken for any caller who passes the cursor, and the missing slot guard on inline updates allows data races with background workers — two separate functional defects in the tool's core read and write paths.

apps/sim/lib/copilot/tools/server/table/user-table.ts — the query_rows case (cursor forwarding) and the update_rows_by_filter inline path (slot claim) are the two spots that need attention before merging.

Important Files Changed

Filename	Overview
apps/sim/lib/copilot/tools/server/table/user-table.ts	Core tool handler. Two unresolved issues from prior review remain: the keyset after cursor is accepted but never forwarded to queryRows, and the inline update_rows_by_filter path (when limit ≤ 1000 or patchTouchesUnique) skips the write-job slot claim. Background dispatch helpers for import/delete/update are otherwise well-structured.
apps/sim/lib/table/update-runner.ts	New background bulk-update worker. Keyset pagination, idempotent patch exclusion (excludeIfPatched), per-page ownership gating, resume-on-retry, and error re-throw for trigger.dev retries are all correctly implemented. Mirrors delete-runner patterns closely.
apps/sim/lib/table/import-runner.ts	Added deleteSourceFile flag (defaults to true) so Mothership imports of persistent workspace files skip source-file cleanup. Existing streaming logic unchanged.
apps/sim/lib/table/delete-runner.ts	No logic changes; file included for test coverage additions. markTableDeleteFailed export unchanged.
apps/sim/background/table-update.ts	New Trigger.dev task wrapping runTableUpdate. Date rehydration pattern, retry policy, and onFailure hook correctly mirror the existing table-delete task.
apps/sim/lib/copilot/tools/server/table/user-table.test.ts	Comprehensive new test cases for slot claim/release, background dispatch, and limit clamping. The flushDetached helper (two Promise.resolve() ticks) correctly drains microtask-scheduled detached work under the current runDetached implementation but would silently mis-validate if that scheduling changes.
apps/sim/lib/table/import-runner.test.ts	New tests cover deleteSourceFile default (true) and opt-out (false) paths; happy-path coverage only for failure modes, which is acceptable given runTableImport's internal self-reporting.
apps/sim/lib/table/update-runner.test.ts	Good coverage of the update loop: supersede detection, maxRows budget, idempotent retry, validation failure, and markTableUpdateFailed emission.
apps/sim/lib/table/rows/ordering.ts	Added selectRowDataPage (for update runner) and updatePageByIds functions; mirrors selectRowIdPage / deletePageByIds patterns for the delete runner.
apps/sim/lib/table/types.ts	Added TableUpdateJobPayload type alongside the existing TableDeleteJobPayload. Shape is consistent with what markTableJobRunning persists and what the Trigger.dev task rehydrates.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant M as Mothership Tool
    participant UT as user-table.ts
    participant JS as jobs/service
    participant DI as dispatchJob()
    participant BG as Background Worker
    participant DB as Database

    Note over M,DB: Large import (≥8 MB CSV) / delete (>1000 matches) / update (>1000 matches)

    M->>UT: import_file / delete_rows_by_filter / update_rows_by_filter
    UT->>DB: queryRows(limit:1) — count matches [delete/update only]
    DB-->>UT: totalCount
    UT->>JS: markTableJobRunning(tableId, jobId, type, payload)
    JS-->>UT: "claimed=true (or false → return job in progress)"
    UT->>DI: dispatchImportJob / dispatchDeleteJob / dispatchUpdateJob
    alt trigger.dev enabled
        DI->>BG: tasks.trigger(taskId, payload, tags)
        BG-->>DI: trigger ACK
    else detached in-process
        DI->>BG: "runDetached(() => runTableImport/Delete/Update)"
    end
    DI-->>UT: void
    UT-->>M: success:true, jobId, doomedCount/affectedCount

    Note over BG,DB: Background worker runs (paginated, ownership-gated)
    loop keyset pages
        BG->>JS: updateJobProgress(tableId, processed, jobId)
        JS-->>BG: "owns=true (or false → stop)"
        BG->>DB: selectRowIdPage / selectRowDataPage
        DB-->>BG: page of rows
        BG->>DB: deletePageByIds / updatePageByIds
    end
    BG->>JS: markJobReady(tableId, jobId)
    BG->>DB: appendTableEvent(status:ready)

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant M as Mothership Tool
    participant UT as user-table.ts
    participant JS as jobs/service
    participant DI as dispatchJob()
    participant BG as Background Worker
    participant DB as Database

    Note over M,DB: Large import (≥8 MB CSV) / delete (>1000 matches) / update (>1000 matches)

    M->>UT: import_file / delete_rows_by_filter / update_rows_by_filter
    UT->>DB: queryRows(limit:1) — count matches [delete/update only]
    DB-->>UT: totalCount
    UT->>JS: markTableJobRunning(tableId, jobId, type, payload)
    JS-->>UT: "claimed=true (or false → return job in progress)"
    UT->>DI: dispatchImportJob / dispatchDeleteJob / dispatchUpdateJob
    alt trigger.dev enabled
        DI->>BG: tasks.trigger(taskId, payload, tags)
        BG-->>DI: trigger ACK
    else detached in-process
        DI->>BG: "runDetached(() => runTableImport/Delete/Update)"
    end
    DI-->>UT: void
    UT-->>M: success:true, jobId, doomedCount/affectedCount

    Note over BG,DB: Background worker runs (paginated, ownership-gated)
    loop keyset pages
        BG->>JS: updateJobProgress(tableId, processed, jobId)
        JS-->>BG: "owns=true (or false → stop)"
        BG->>DB: selectRowIdPage / selectRowDataPage
        DB-->>BG: page of rows
        BG->>DB: deletePageByIds / updatePageByIds
    end
    BG->>JS: markJobReady(tableId, jobId)
    BG->>DB: appendTableEvent(status:ready)

_{Reviews (15): Last reviewed commit: "docs(mothership): drop redundant backgro..." | Re-trigger Greptile}

greptile-apps · 2026-06-12T23:37:15Z

Greptile Summary

This PR closes three performance gaps in Mothership's table operations: sandbox table mounts now drain all rows via keyset pagination instead of silently truncating to 100, query_rows gains a 1000-row limit cap and keyset cursor support, and large CSV/TSV imports plus unbounded filter-deletes are handed off to the existing background-job infrastructure (with a new get_job poll operation). The underlying data model also migrates: per-import state columns on user_table_definitions are replaced by a new table_jobs table with a partial-unique index enforcing one running write-job per table.

function-execute mounts: buildTableCsvForMount pages through selectExportRowPage in 5k-row chunks, remaps column-id keys to display names, and counts toward the 50 MB budget — fixing empty-column mounts on id-keyed tables and the silent 100-row truncation.
query_rows contract parity: limit now rejected above 1000, withExecutions: false, and keyset after/nextCursor accepted (rejected alongside sort); inline imports/deletes claim the one-write-job slot to block interleaving.
Background import/delete in Mothership: CSV/TSV ≥ 8 MB dispatches tableImportTask; unbounded filter-deletes above 1000 rows dispatch tableDeleteTask with a pendingDeleteMask that hides doomed rows immediately from all read paths.

Confidence Score: 3/5

Safe to merge for most workloads; enterprise tables with hundreds of thousands of rows could exhaust Vercel function memory during a sandbox mount before the 50 MB budget check fires.

The core job-slot migration, keyset pagination, background import/delete dispatch, and schema change are all well-structured and tested. The one concern worth resolving before a wide enterprise rollout is buildTableCsvForMount: all table CSVs are fully assembled in parallel memory before the budget check, so a user who mounts a 500k-row enterprise table gets a function OOM rather than a clean rejection.

apps/sim/lib/copilot/tools/handlers/function-execute.ts — the buildTableCsvForMount loop and the parallel Promise.all mount assembly need a per-page budget check to avoid OOM on large enterprise tables.

Important Files Changed

Filename	Overview
apps/sim/lib/copilot/tools/handlers/function-execute.ts	Replaces the truncated 100-row queryRows mount with a keyset-draining buildTableCsvForMount that reads all rows and remaps id-keyed cells back to display names. Has a memory safety issue: the full CSV is materialized in memory before the 50 MB budget check runs, and pendingDeleteMask is called once per page (N+1 pattern).
apps/sim/lib/table/service.ts	Major overhaul: migrates import-status fields from user_table_definitions to table_jobs, adds selectExportRowPage (position-keyset export drain), pendingDeleteMask for delete-job visibility, selectRowIdPage/deletePageByIds for async delete workers, and latestJobForTable/latestJobsForTables for the new job fields. pendingDeleteMask now runs on every queryRows call regardless of table state.
apps/sim/lib/copilot/tools/server/table/user-table.ts	Adds query_rows limit clamping, keyset cursor (after/nextCursor), background delete dispatch for unbounded deletes, background import dispatch for large CSV/TSV files, get_job polling operation, and job-slot claim/release for inline imports. Logic is well-structured with proper guards and rollback paths.
apps/sim/lib/table/import-runner.ts	Renames import-specific service calls to generic job variants (markImportFailed→markJobFailed, etc.), adds deleteSourceFile flag (defaults true for UI single-use uploads; pass false for Mothership persistent workspace files), and updates SSE events to the new job event kind.
packages/db/migrations/0233_table_jobs_and_keyset.sql	Creates table_jobs with partial unique index (one running write-job per table), migrates existing import_status rows idempotently via ON CONFLICT DO NOTHING, tunes autovacuum on user_table_rows, and adds two CONCURRENT indexes (table_id+id keyset, tenant-scoped btree_gin) while dropping the old cross-tenant GIN index.
apps/sim/lib/table/delete-runner.ts	New async delete worker: walks rows in keyset pages (selectRowIdPage → deletePageByIds), checks job ownership on each page, applies cutoff/filter/exclusion scope, emits SSE events, and marks the job terminal or failed.
packages/db/schema.ts	Adds tableJobs table with unique partial index, replaces the old cross-tenant GIN index on user_table_rows with a tenant-scoped btree_gin index, adds table_id+id keyset index, and removes import_* columns from userTableDefinitions.
apps/sim/lib/table/dispatcher.ts	Extends DispatchScope with filter/excludeRowIds for select-all-under-filter and select-all-minus-deselections, wraps the dispatcherStep page query in withSeqscanOff for jsonb-filtered scopes, and adds scope-aware markActiveDispatchesCancelled with JSONB scope comparison.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Mothership: import_file / create_from_file] --> B{CSV/TSV 8 MB?}
    B -- No --> C[markTableJobRunning claim slot]
    C --> D[Inline import parseFileRows + batchInsert]
    D --> E[releaseJobClaim]
    B -- Yes --> F[createTable with jobStatus=running or markTableJobRunning]
    F --> G{isTriggerDevEnabled?}
    G -- Yes --> H[tasks.trigger tableImportTask]
    G -- No --> I[runDetached runTableImport]
    H -- dispatch error --> J[releaseJobClaim / deleteTable]
    I --> K[runTableImport streaming keyset import]
    H --> K

    L[Mothership: delete_rows_by_filter] --> M{limit === undefined?}
    M -- Yes --> N[queryRows count with pendingDeleteMask]
    N --> O{matchCount > 1000?}
    O -- No --> P[deleteRowsByFilter inline]
    O -- Yes --> Q[markTableJobRunning claim]
    Q --> R[dispatchDeleteJob]
    R --> S[runTableDelete keyset page walk]
    M -- No --> P

    T[Mothership: query_rows] --> U{limit > 1000?}
    U -- Yes --> V[return error]
    U -- No --> W[queryRows with pendingDeleteMask + after cursor]
    W --> X[return rows + nextCursor]

    Y[Mothership: get_job] --> Z[getTableById latestJobForTable]
    Z --> AA[return job status / rowCount]

_{Reviews (2): Last reviewed commit: "improvement(mothership): table speed par..." | Re-trigger Greptile}

TheodoreSpeaks · 2026-06-16T01:56:58Z

@greptile review

TheodoreSpeaks · 2026-06-16T02:10:09Z

@greptile review

TheodoreSpeaks · 2026-06-16T21:31:22Z

@greptile review

cursor · 2026-06-16T23:15:23Z

            table,
            {
-              filter: filterNamesToIds(args.filter, idByName),
+              filter: idFilter,


Inline delete bypasses bulk cap

Medium Severity

For delete_rows_by_filter without an explicit limit, a query_rows count at or below 1000 chooses the inline path, but deleteRowsByFilter runs with no limit and loads every current match into memory. Rows matching the filter can grow before delete runs, exceeding the 1000-row bulk cap this change adds elsewhere.

^{Reviewed by Cursor Bugbot for commit 45096a5. Configure here.}

TheodoreSpeaks · 2026-06-17T19:59:56Z

@greptile review

…c import/delete/update jobs - query_rows / filter ops clamp limit to the contract maxes; query_rows skips execution metadata. - import_file / create_from_file (large CSV/TSV) and delete_rows_by_filter (>1000 unbounded matches) dispatch background table jobs, claiming the per-table job slot; inline paths claim the slot too. - update_rows_by_filter now escalates the same way: >1000 unbounded matches run as a background table job (new 'update' job type + runTableUpdate worker + tableUpdateTask), so a broad update on a huge table no longer loads every row into the request. Best-effort/non-atomic and skips workflow recompute (documented); unique-column patches stay inline. Pagination is limit/offset. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

TheodoreSpeaks · 2026-06-17T20:26:37Z

@greptile review

Drop the verbose doomedCount/affectedCount, delete-mask, workflow-recompute, and unique-column asides from the bulk-op descriptions. The model only needs: large ops return { jobId }, limit maxes at 1000, one job per table. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

TheodoreSpeaks · 2026-06-17T21:21:29Z

@greptile review

…l-facing The model can now pass any limit — no "cannot exceed 1000" rejection. 1000 becomes an internal threshold: query_rows clamps the page to MAX_QUERY_LIMIT (totalCount signals truncation; the model pages with offset), and bulk filter ops above the cap run as background jobs. update_rows_by_filter loads full row data inline, so an explicit limit above the cap escalates to the background worker with a new maxRows budget (the worker stops after maxRows; update has no read mask so the cap is exact). delete only loads ids inline, so an explicit limit (any size) stays inline — only unbounded deletes use the masked background path, which would over-hide a bounded delete. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

TheodoreSpeaks · 2026-06-17T21:54:43Z

@greptile review

… inline An explicit delete limit now mirrors update: ≤1000 runs inline, above the cap it escalates to the background worker honoring the limit via maxRows — instead of always staying inline. The worker stops after maxRows (per-page fetch capped to the remaining budget). Bounded background deletes skip pendingDeleteMask: the filter-based mask hides every match, which would over-hide the rows beyond the cap the job never deletes. Unmasked, a bounded delete is eventually consistent like a bounded update (rows disappear as deleted), and doomedCount is omitted from the payload so the count isn't double-subtracted. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

TheodoreSpeaks · 2026-06-17T22:01:55Z

@greptile review

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit f1ee3e9. Configure here.}

Drop "Any value is allowed" from the limit description and restore the original offset description. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The bounded-delete commit (f1ee3e9) persisted maxRows and omitted doomedCount but the pendingDeleteMask guard that makes it work was left uncommitted, so the shipped mask still hid every filter+cutoff match — over-hiding the rows beyond maxRows that the job never deletes (they vanished from reads until the job ended, then reappeared). Return no mask when maxRows is set: a bounded delete is eventually consistent (rows disappear as deleted), like a bounded update. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The op descriptions already cover background escalation; the limit arg only needs to say what the param does. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

TheodoreSpeaks · 2026-06-17T22:45:31Z

@greptile review

github-actions · 2026-06-18T00:51:01Z

✅ Cross-repo companion check

All declared companion PRs are merged into staging.

✅ simstudioai/copilot#312 — merged into staging — contract(user_table): limit bounds, background import/delete/update docs

cursor Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread apps/sim/app/api/table/[tableId]/rows/route.ts

greptile-apps Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread apps/sim/lib/copilot/tools/handlers/function-execute.ts Outdated

Comment thread apps/sim/lib/copilot/tools/server/table/user-table.ts Outdated

greptile-apps Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread apps/sim/lib/copilot/tools/handlers/function-execute.ts Outdated

Comment thread apps/sim/lib/copilot/tools/handlers/function-execute.ts Outdated

Comment thread apps/sim/lib/table/service.ts Outdated

TheodoreSpeaks changed the title ~~improvement(mothership): table speed parity — keyset reads, limit bounds, background import/delete jobs~~ improvement(mothership): user_table speed parity — limit bounds, keyset paging, background import/delete jobs Jun 16, 2026

vercel Bot temporarily deployed to Preview June 16, 2026 01:57 Inactive