improvement(mothership): user_table speed parity — limit bounds, background import/delete/update jobs#5012
Conversation
|
@greptile review |
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryMedium Risk Overview Reads and limits. Imports. Large CSV/TSV (≥ Deletes and updates. New update pipeline. Adds Tests cover dispatch, slot contention, clamping, and runner behavior. Reviewed by Cursor Bugbot for commit b551d90. Bugbot is set up for automated code reviews on this repo. Configure here. |
Greptile SummaryThis PR improves the
Confidence Score: 3/5Two issues from prior reviews remain open in user-table.ts and affect real model-facing behavior: the query_rows keyset cursor is parsed but never forwarded to the underlying query (models calling with after will loop on the same first page indefinitely), and the inline update_rows_by_filter path skips the write-job slot claim (allowing it to interleave with a concurrent background delete or import). The new runners (update-runner.ts), background dispatch helpers, and import-runner deleteSourceFile flag are well-implemented and tested. The unresolved after cursor drop means query_rows paging is silently broken for any caller who passes the cursor, and the missing slot guard on inline updates allows data races with background workers — two separate functional defects in the tool's core read and write paths. apps/sim/lib/copilot/tools/server/table/user-table.ts — the query_rows case (cursor forwarding) and the update_rows_by_filter inline path (slot claim) are the two spots that need attention before merging. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant M as Mothership Tool
participant UT as user-table.ts
participant JS as jobs/service
participant DI as dispatchJob()
participant BG as Background Worker
participant DB as Database
Note over M,DB: Large import (≥8 MB CSV) / delete (>1000 matches) / update (>1000 matches)
M->>UT: import_file / delete_rows_by_filter / update_rows_by_filter
UT->>DB: queryRows(limit:1) — count matches [delete/update only]
DB-->>UT: totalCount
UT->>JS: markTableJobRunning(tableId, jobId, type, payload)
JS-->>UT: "claimed=true (or false → return job in progress)"
UT->>DI: dispatchImportJob / dispatchDeleteJob / dispatchUpdateJob
alt trigger.dev enabled
DI->>BG: tasks.trigger(taskId, payload, tags)
BG-->>DI: trigger ACK
else detached in-process
DI->>BG: "runDetached(() => runTableImport/Delete/Update)"
end
DI-->>UT: void
UT-->>M: success:true, jobId, doomedCount/affectedCount
Note over BG,DB: Background worker runs (paginated, ownership-gated)
loop keyset pages
BG->>JS: updateJobProgress(tableId, processed, jobId)
JS-->>BG: "owns=true (or false → stop)"
BG->>DB: selectRowIdPage / selectRowDataPage
DB-->>BG: page of rows
BG->>DB: deletePageByIds / updatePageByIds
end
BG->>JS: markJobReady(tableId, jobId)
BG->>DB: appendTableEvent(status:ready)
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant M as Mothership Tool
participant UT as user-table.ts
participant JS as jobs/service
participant DI as dispatchJob()
participant BG as Background Worker
participant DB as Database
Note over M,DB: Large import (≥8 MB CSV) / delete (>1000 matches) / update (>1000 matches)
M->>UT: import_file / delete_rows_by_filter / update_rows_by_filter
UT->>DB: queryRows(limit:1) — count matches [delete/update only]
DB-->>UT: totalCount
UT->>JS: markTableJobRunning(tableId, jobId, type, payload)
JS-->>UT: "claimed=true (or false → return job in progress)"
UT->>DI: dispatchImportJob / dispatchDeleteJob / dispatchUpdateJob
alt trigger.dev enabled
DI->>BG: tasks.trigger(taskId, payload, tags)
BG-->>DI: trigger ACK
else detached in-process
DI->>BG: "runDetached(() => runTableImport/Delete/Update)"
end
DI-->>UT: void
UT-->>M: success:true, jobId, doomedCount/affectedCount
Note over BG,DB: Background worker runs (paginated, ownership-gated)
loop keyset pages
BG->>JS: updateJobProgress(tableId, processed, jobId)
JS-->>BG: "owns=true (or false → stop)"
BG->>DB: selectRowIdPage / selectRowDataPage
DB-->>BG: page of rows
BG->>DB: deletePageByIds / updatePageByIds
end
BG->>JS: markJobReady(tableId, jobId)
BG->>DB: appendTableEvent(status:ready)
Reviews (15): Last reviewed commit: "docs(mothership): drop redundant backgro..." | Re-trigger Greptile |
Greptile SummaryThis PR closes three performance gaps in Mothership's table operations: sandbox table mounts now drain all rows via keyset pagination instead of silently truncating to 100,
Confidence Score: 3/5Safe to merge for most workloads; enterprise tables with hundreds of thousands of rows could exhaust Vercel function memory during a sandbox mount before the 50 MB budget check fires. The core job-slot migration, keyset pagination, background import/delete dispatch, and schema change are all well-structured and tested. The one concern worth resolving before a wide enterprise rollout is buildTableCsvForMount: all table CSVs are fully assembled in parallel memory before the budget check, so a user who mounts a 500k-row enterprise table gets a function OOM rather than a clean rejection. apps/sim/lib/copilot/tools/handlers/function-execute.ts — the buildTableCsvForMount loop and the parallel Promise.all mount assembly need a per-page budget check to avoid OOM on large enterprise tables. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Mothership: import_file / create_from_file] --> B{CSV/TSV 8 MB?}
B -- No --> C[markTableJobRunning claim slot]
C --> D[Inline import parseFileRows + batchInsert]
D --> E[releaseJobClaim]
B -- Yes --> F[createTable with jobStatus=running or markTableJobRunning]
F --> G{isTriggerDevEnabled?}
G -- Yes --> H[tasks.trigger tableImportTask]
G -- No --> I[runDetached runTableImport]
H -- dispatch error --> J[releaseJobClaim / deleteTable]
I --> K[runTableImport streaming keyset import]
H --> K
L[Mothership: delete_rows_by_filter] --> M{limit === undefined?}
M -- Yes --> N[queryRows count with pendingDeleteMask]
N --> O{matchCount > 1000?}
O -- No --> P[deleteRowsByFilter inline]
O -- Yes --> Q[markTableJobRunning claim]
Q --> R[dispatchDeleteJob]
R --> S[runTableDelete keyset page walk]
M -- No --> P
T[Mothership: query_rows] --> U{limit > 1000?}
U -- Yes --> V[return error]
U -- No --> W[queryRows with pendingDeleteMask + after cursor]
W --> X[return rows + nextCursor]
Y[Mothership: get_job] --> Z[getTableById latestJobForTable]
Z --> AA[return job status / rowCount]
Reviews (2): Last reviewed commit: "improvement(mothership): table speed par..." | Re-trigger Greptile |
|
@greptile review |
|
@greptile review |
0e47d6e to
440c1d6
Compare
|
@greptile review |
440c1d6 to
419894a
Compare
419894a to
45096a5
Compare
| table, | ||
| { | ||
| filter: filterNamesToIds(args.filter, idByName), | ||
| filter: idFilter, |
There was a problem hiding this comment.
Inline delete bypasses bulk cap
Medium Severity
For delete_rows_by_filter without an explicit limit, a query_rows count at or below 1000 chooses the inline path, but deleteRowsByFilter runs with no limit and loads every current match into memory. Rows matching the filter can grow before delete runs, exceeding the 1000-row bulk cap this change adds elsewhere.
Reviewed by Cursor Bugbot for commit 45096a5. Configure here.
45096a5 to
59efc85
Compare
59efc85 to
23af5b9
Compare
|
@greptile review |
…c import/delete/update jobs - query_rows / filter ops clamp limit to the contract maxes; query_rows skips execution metadata. - import_file / create_from_file (large CSV/TSV) and delete_rows_by_filter (>1000 unbounded matches) dispatch background table jobs, claiming the per-table job slot; inline paths claim the slot too. - update_rows_by_filter now escalates the same way: >1000 unbounded matches run as a background table job (new 'update' job type + runTableUpdate worker + tableUpdateTask), so a broad update on a huge table no longer loads every row into the request. Best-effort/non-atomic and skips workflow recompute (documented); unique-column patches stay inline. Pagination is limit/offset. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
6273f4d to
7535e33
Compare
|
@greptile review |
Drop the verbose doomedCount/affectedCount, delete-mask, workflow-recompute,
and unique-column asides from the bulk-op descriptions. The model only needs:
large ops return { jobId }, limit maxes at 1000, one job per table.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@greptile review |
…l-facing The model can now pass any limit — no "cannot exceed 1000" rejection. 1000 becomes an internal threshold: query_rows clamps the page to MAX_QUERY_LIMIT (totalCount signals truncation; the model pages with offset), and bulk filter ops above the cap run as background jobs. update_rows_by_filter loads full row data inline, so an explicit limit above the cap escalates to the background worker with a new maxRows budget (the worker stops after maxRows; update has no read mask so the cap is exact). delete only loads ids inline, so an explicit limit (any size) stays inline — only unbounded deletes use the masked background path, which would over-hide a bounded delete. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@greptile review |
… inline An explicit delete limit now mirrors update: ≤1000 runs inline, above the cap it escalates to the background worker honoring the limit via maxRows — instead of always staying inline. The worker stops after maxRows (per-page fetch capped to the remaining budget). Bounded background deletes skip pendingDeleteMask: the filter-based mask hides every match, which would over-hide the rows beyond the cap the job never deletes. Unmasked, a bounded delete is eventually consistent like a bounded update (rows disappear as deleted), and doomedCount is omitted from the payload so the count isn't double-subtracted. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@greptile review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 4 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit f1ee3e9. Configure here.
Drop "Any value is allowed" from the limit description and restore the original offset description. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The bounded-delete commit (f1ee3e9) persisted maxRows and omitted doomedCount but the pendingDeleteMask guard that makes it work was left uncommitted, so the shipped mask still hid every filter+cutoff match — over-hiding the rows beyond maxRows that the job never deletes (they vanished from reads until the job ended, then reappeared). Return no mask when maxRows is set: a bounded delete is eventually consistent (rows disappear as deleted), like a bounded update. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The op descriptions already cover background escalation; the limit arg only needs to say what the param does. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@greptile review |
✅ Cross-repo companion checkAll declared companion PRs are merged into
|


Context
Mothership's
user_tabletool lagged the fast/safe paths the tables feature already has. This PR lands three of the four findings from the audit; the fourth (thefunction_executelarge-table CSV mount) is deferred — see note at the bottom.Companion: https://github.com/simstudioai/copilot/pull/312
Changes
C — limit bounds.
query_rowstook an unboundedlimit(whole table into one tool result) and the filter ops took unbounded limits (every matching row's id+data into memory). Now rejected aboveMAX_QUERY_LIMIT(1000) /MAX_BULK_OPERATION_SIZE(1000) with the contract's message text.D —
query_rowswaste + paging. Dropped the always-on executions-metadata load (withExecutions: false), and added keyset pagination: accepts anaftercursor, returnsnextCursorwhen a default-order page fills (rejectsafter+sort). Offset paging was O(n²) walking a big table.E — async job parity for imports/deletes.
import_file/create_from_file(CSV/TSV ≥8MB) anddelete_rows_by_filter(>1000 matches, no explicit limit) now dispatch the same trigger.dev jobs the UI routes use, claiming the per-table job slot so they can't race a running background job. Smaller imports stay inline but claim/release the slot. The model polls progress viaquery_rows(rows appear as an import lands; the delete mask makesquery_rowsreflect the post-delete view immediately) — no dedicated job-status op.TableImportPayload.deleteSourceFileflag (default true) so Mothership imports of persistent workspace files don't delete the source.Tests
user-table.test.ts(limit clamps, keysetafter/nextCursor,after+sortrejection, slot claim/release, async dispatch payloads) andimport-runner.test.ts(source-file cleanup default vsdeleteSourceFile: false).bun run lint:check,type-check, andcheck:api-validationgreen.Deferred (not in this PR)
filter/limit/columnsselection, or by-reference signed-URL fetch — is a separate effort.Model-facing docs
Catalog change documenting
after/ limit maxes / background behavior is in simstudioai/copilot#312.🤖 Generated with Claude Code