Skip to content

feat(files): stream large CSV previews and add import-as-table#5125

Merged
TheodoreSpeaks merged 4 commits into
stagingfrom
fix/large-table-file-view
Jun 18, 2026
Merged

feat(files): stream large CSV previews and add import-as-table#5125
TheodoreSpeaks merged 4 commits into
stagingfrom
fix/large-table-file-view

Conversation

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator

Summary

  • Large CSVs no longer crash the file viewer — a new bounded server route streams the first 1,000 rows from storage instead of loading the whole file into the editor (response.text() OOM'd on multi-GB files)
  • CSVs over 5MB render a read-only streamed preview; smaller CSVs stay fully editable
  • A warning toast surfaces "Import as a table", which kicks off the existing async import pointed at the file's existing storage key (no re-upload) with a new keepSource flag so the source file isn't deleted; the success toast offers a "View tables" button
  • Hide Save / edit-split / preview-toggle controls for the read-only large-CSV path in both the Files page and the mothership view (shared isCsvStreamOnly predicate)

Type of Change

  • Bug fix
  • New feature

Testing

  • Unit: csv-preview-slice (slice, truncation boundary, delimiter detection, early stream-abort) — 9 passing; existing import suites green (40)
  • type-check, biome lint, check:api-validation:strict all pass
  • Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel

vercel Bot commented Jun 18, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 18, 2026 1:38am

Request Review

@cursor

cursor Bot commented Jun 18, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
New authenticated API reads from storage with key validation; async import behavior changes via deleteSourceFile, but existing upload flow defaults are preserved.

Overview
Large CSVs no longer load entirely in the browser. CSVs over 5MB use a read-only server-streamed preview (first 1,000 rows) instead of the text editor, avoiding OOM from loading the full file. Smaller CSVs stay editable; inline CSV preview parsing is also capped at 1,000 rows with a truncated flag.

A new GET workspace CSV preview route resolves the file by ID and reads from the authoritative storage key (not trusting the client key alone). Streaming is implemented in getCsvPreviewSlice, which stops reading after the row cap and destroys the download stream early.

When preview is truncated, a warning toast offers Import as a table, which starts the existing async table import against the file already in workspace storage (no re-upload). Async import accepts optional deleteSourceFile (default delete for upload flow); the file-viewer path sends deleteSourceFile: false so the original file remains.

Save and edit/split/preview controls are hidden for stream-only large CSVs in the Files page and mothership view via shared isCsvStreamOnly. Unit tests cover the CSV preview slice behavior.

Reviewed by Cursor Bugbot for commit d9de35d. Bugbot is set up for automated code reviews on this repo. Configure here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0321342. Configure here.

Comment thread apps/sim/app/api/workspaces/[id]/files/[fileId]/csv-preview/route.ts Outdated
@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes OOM crashes when opening large CSVs in the file viewer by introducing a server-side streaming preview route that reads only the first 1,000 rows from storage, adds a size-threshold guard (isCsvStreamOnly) that routes files over 5 MB to a read-only CsvTablePreview instead of the full editor, and surfaces an "Import as a table" toast that reuses the existing async import pipeline with deleteSourceFile: false so the source file survives.

  • A new GET /api/workspaces/[id]/files/[fileId]/csv-preview route streams the row slice via getCsvPreviewSlice (Node.js csv-parse) and validates the file record against the DB before serving, preventing access to stale or cross-workspace keys.
  • The mothership's preview-toggle loading guard (d9de35d) now only suppresses the toggle during filesLoading for CSV files, so non-CSV rich types (markdown, svg, html, mermaid) retain their toggle immediately on cold loads.
  • deleteSourceFile is threaded through the import-async route with !== false semantics, preserving existing upload-flow cleanup while correctly keeping workspace files intact on the new import path.

Confidence Score: 5/5

Safe to merge — the streaming logic, access controls, and import-flag semantics are all correct.

The core OOM fix (server-side streaming, browser-side size gate), the loading-guard scoping to CSV-only, and the deleteSourceFile opt-out semantics are all implemented correctly and backed by tests. There are no broken code paths in the changed routes or components.

mothership-view.tsx — the isActiveCsv check uses extension-only identification, which is inconsistent with isCsvStreamOnly's MIME-aware check; low-impact in practice but worth aligning.

Important Files Changed

Filename Overview
apps/sim/lib/file-parsers/csv-preview-slice.ts New server-side streaming parser; correctly caps at CSV_PREVIEW_MAX_ROWS, destroys source on abort/truncation, and uses incremental (not O(n²)) buffer accumulation for delimiter sniffing.
apps/sim/app/api/workspaces/[id]/files/[fileId]/csv-preview/route.ts New GET route; validates workspace permission, confirms file record exists in DB and key matches, then delegates to getCsvPreviewSlice — no unvalidated client-supplied key accepted.
apps/sim/app/workspace/[workspaceId]/home/components/mothership-view/mothership-view.tsx Loading guard now scoped to CSV files only via isActiveCsv, so non-CSV rich types no longer lose the toggle during cold loads; isActiveCsv uses getFileExtension while isCsvStreamOnly uses resolvePreviewType (MIME-aware), a minor inconsistency for extension-less CSV files.
apps/sim/app/workspace/[workspaceId]/files/components/file-viewer/csv-import.ts New hook wiring the truncation toast and import action; importingRef correctly prevents double-submission, notifiedKeyRef prevents repeat toasts per file key, and onSettled resets the guard for retry.
apps/sim/app/workspace/[workspaceId]/files/components/file-viewer/file-viewer.tsx isCsvStreamOnly predicate added and correctly gates the CsvTablePreview path; null/undefined file.size defaults to 0 (safe: falls through to editable mode rather than misclassifying).
apps/sim/app/api/table/import-async/route.ts deleteSourceFile threaded through to job payload; import-runner checks deleteSourceFile !== false so undefined (existing upload flows) still deletes the temp object — correct opt-out semantics.
apps/sim/hooks/queries/tables.ts New useImportFileAsTable mutation hardcodes deleteSourceFile:false and handles errors/invalidation; clean addition alongside existing import hooks.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Browser
    participant FileViewer
    participant CsvTablePreview
    participant API as /api/workspaces/[id]/files/[fileId]/csv-preview
    participant Storage
    participant ImportAPI as /api/table/import-async

    Browser->>FileViewer: Open CSV file
    FileViewer->>FileViewer: isCsvStreamOnly(file) ?
    alt "file.size > 5MB"
        FileViewer->>CsvTablePreview: render read-only preview
        CsvTablePreview->>API: GET csv-preview (key, fileId, workspaceId)
        API->>API: checkSessionOrInternalAuth
        API->>API: getUserEntityPermissions
        API->>API: getWorkspaceFile(workspaceId, fileId) — verify key matches
        API->>Storage: downloadFileStream (stream only)
        Storage-->>API: Readable stream
        API->>API: getCsvPreviewSlice (first 1000 rows)
        API-->>CsvTablePreview: "{ headers, rows, truncated }"
        CsvTablePreview->>Browser: Render DataTable
        note over CsvTablePreview: truncated=true → fire Import as a table toast
        Browser->>ImportAPI: POST import-async (fileKey, deleteSourceFile:false)
        ImportAPI->>Storage: stream entire file
        ImportAPI-->>Browser: "{ tableId, importId }"
    else "file.size <= 5MB"
        FileViewer->>Browser: TextEditor (editable)
        note over Browser: parseCsv caps DataTable preview at 1000 rows, triggers same toast if truncated
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Browser
    participant FileViewer
    participant CsvTablePreview
    participant API as /api/workspaces/[id]/files/[fileId]/csv-preview
    participant Storage
    participant ImportAPI as /api/table/import-async

    Browser->>FileViewer: Open CSV file
    FileViewer->>FileViewer: isCsvStreamOnly(file) ?
    alt "file.size > 5MB"
        FileViewer->>CsvTablePreview: render read-only preview
        CsvTablePreview->>API: GET csv-preview (key, fileId, workspaceId)
        API->>API: checkSessionOrInternalAuth
        API->>API: getUserEntityPermissions
        API->>API: getWorkspaceFile(workspaceId, fileId) — verify key matches
        API->>Storage: downloadFileStream (stream only)
        Storage-->>API: Readable stream
        API->>API: getCsvPreviewSlice (first 1000 rows)
        API-->>CsvTablePreview: "{ headers, rows, truncated }"
        CsvTablePreview->>Browser: Render DataTable
        note over CsvTablePreview: truncated=true → fire Import as a table toast
        Browser->>ImportAPI: POST import-async (fileKey, deleteSourceFile:false)
        ImportAPI->>Storage: stream entire file
        ImportAPI-->>Browser: "{ tableId, importId }"
    else "file.size <= 5MB"
        FileViewer->>Browser: TextEditor (editable)
        note over Browser: parseCsv caps DataTable preview at 1000 rows, triggers same toast if truncated
    end
Loading

Reviews (3): Last reviewed commit: "fix(files): scope mothership preview-tog..." | Re-trigger Greptile

Comment thread apps/sim/app/api/workspaces/[id]/files/[fileId]/csv-preview/route.ts Outdated
Comment thread apps/sim/lib/file-parsers/csv-preview-slice.ts
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Addressed the review findings (commit 81ca970) + merged staging (56f941c):

  • CSV preview skips file auth / fileId not validated (Bugbot + Greptile) — the route now resolves the file via getWorkspaceFile(workspaceId, fileId), returns 404 for archived/deleted/foreign files or a key that doesn't match the live record, and streams from the record's authoritative key (not the client-supplied one). Matches /api/files/serve's guarantees.
  • Edit/split toggle flashes for large CSVs while files load — gated isActivePreviewable on !filesLoading so the toggle only resolves once the record's size is known.
  • No guard against double-invoking importAsTable — added an importingRef guard that blocks a second kickoff and resets onSettled so a failed import can retry.
  • O(n²) buffer concat in the delimiter sniff — accumulate the header line incrementally per chunk instead of re-Buffer.concat-ing the whole prefix each iteration.

Also: staging shipped this same source-preservation feature as deleteSourceFile?: boolean, so I dropped my keepSource flag and adopted deleteSourceFile end-to-end (contract → route → hook → runner). Resolved the import-runner.ts conflict to staging's version, which already has test coverage for it.

@greptile review

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

Scoped the mothership preview-toggle loading guard to CSV files only (d9de35d) — non-CSV rich types (markdown, html, svg, mermaid) no longer lose the toggle during cold loads.

@greptile review

@TheodoreSpeaks TheodoreSpeaks merged commit 63a3e6d into staging Jun 18, 2026
16 checks passed
@TheodoreSpeaks TheodoreSpeaks deleted the fix/large-table-file-view branch June 18, 2026 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant