Skip to content

improvement(integrations): validate BigQuery/Forms/PageSpeed + regenerate integration docs#5109

Merged
waleedlatif1 merged 3 commits into
stagingfrom
validate/lighthouse-integration
Jun 17, 2026
Merged

improvement(integrations): validate BigQuery/Forms/PageSpeed + regenerate integration docs#5109
waleedlatif1 merged 3 commits into
stagingfrom
validate/lighthouse-integration

Conversation

@waleedlatif1

Copy link
Copy Markdown
Collaborator

Summary

  • BigQuery — marked null-defaulted outputs as optional: true (get_table type/numRows/numBytes/creationTime/lastModifiedTime/location, list_datasets location, list_tables type, query totalBytesProcessed); these are genuinely absent for views/external tables/cached queries. Validated all 5 tools against the live REST API — no other issues.
  • Google Forms — added response pagination to get_responses (pageToken + filter params, nextPageToken output) which was previously missing; fixed pageSize visibility (user-onlyuser-or-llm); set pagination subBlocks to advanced and added a filter wandConfig. Validated full tool surface + OAuth scopes.
  • PageSpeed (Lighthouse) — added a 7th BlockMeta template (competitor benchmark) to meet the ≥7 convention; confirmed the API has a single runPagespeed endpoint so the one-tool integration is complete; every response field-path verified.
  • Docs — regenerated integration docs (the generator was stale); added hand-written intro sections to the 5 new enrichment pages (datagma, dropcontact, enrow, icypeas, leadmagic).

Type of Change

  • Improvement

Testing

Tested manually — bun run lint clean, tsc --noEmit clean, docs regen idempotent.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…rate integration docs

- BigQuery: mark null-defaulted outputs optional (get_table type/numRows/numBytes/creationTime/lastModifiedTime/location, list_datasets location, list_tables type, query totalBytesProcessed)
- Google Forms: add response pagination (pageToken + filter params, nextPageToken output), fix pageSize visibility, advanced-mode pagination subBlocks + filter wandConfig
- PageSpeed: add a 7th BlockMeta template (competitor benchmark)
- Regenerate integration docs; add manual intro sections to new datagma/dropcontact/enrow/icypeas/leadmagic pages
@vercel

vercel Bot commented Jun 17, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Jun 17, 2026 6:03am

Request Review

@cursor

cursor Bot commented Jun 17, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
Changes are predominantly generated docs and optional output flags; the main functional note is Google Forms response pagination, which is additive and low risk if tested against large forms.

Overview
This PR refreshes the integrations documentation catalog to match current tool definitions, alongside runtime/tool fixes called out in the PR (BigQuery optional outputs, Google Forms response pagination, PageSpeed BlockMeta template).

New enrichment integrations in docs: Adds full MDX pages and nav entries for Datagma, Dropcontact, Enrow, Icypeas, and LeadMagic, with hand-written intro sections and matching brand icons wired through icon-mapping.ts.

Docs accuracy pass (generated + corrected): Many pages move from truncated or shared “mega” output tables to per-action inputs/outputs—notably RB2B, ClickHouse, File (compress/decompress, clearer read/get/fetch params), Google Sheets/Excel (sheetName + cellRange, read filters), Kalshi (extra filters/subaccount params), and document parsers (Extend, Mistral, Pulse, Reducto) simplifying to required file input. Smaller description fixes land across Apollo, Calendar, Forms (pageToken/filter/nextPageToken), Reddit mod actions, and others.

Risk: Low for users—mostly documentation and optional output metadata; Forms pagination is a behavioral addition worth verifying in workflows that list large response sets.

Reviewed by Cursor Bugbot for commit 689cae6. Configure here.

@greptile-apps

greptile-apps Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR validates and improves three integrations (BigQuery, Google Forms, PageSpeed) and regenerates/extends integration docs. The code changes are targeted and correct: BigQuery output fields that can be absent for views/external tables are marked optional: true, Google Forms gains proper pagination support (pageToken + filter inputs, nextPageToken output), and PageSpeed adds a seventh block template to meet the project convention.

  • BigQuery: Six output fields across four tools (get_table, list_datasets, list_tables, query) now correctly declare optional: true for fields absent on views, external tables, and cached-query results.
  • Google Forms: get_responses adds pageToken and filter params (both advanced mode) with a wandConfig-powered filter prompt, and surfaces nextPageToken in the tool output so callers can drive cursor-based pagination.
  • generate-docs.ts: Regex is fixed to distinguish the opening and closing quote character, preventing apostrophes inside double-quoted descriptions from truncating the extracted text; per-tool params scoping prevents multi-tool files from inheriting the wrong params block.

Confidence Score: 5/5

Safe to merge; all functional changes are additive (new optional params, optional output fields) with no breaking changes to existing behaviour.

The BigQuery and PageSpeed changes are purely additive metadata corrections. The Google Forms pagination additions are isolated to the get_responses path and introduce no regressions for callers that don't supply the new params. The docs-generator regex fixes only affect the offline script output. No auth, data-loss, or breaking-change risk is introduced.

apps/sim/tools/google_forms/get_responses.ts and apps/sim/tools/google_forms/utils.ts — the interaction between pageToken and filter when both are supplied warrants a second look.

Important Files Changed

Filename Overview
apps/sim/tools/google_forms/get_responses.ts Adds pageToken, filter pagination params and nextPageToken output; logic is sound but nextPageToken is emitted as null rather than undefined, which may surprise downstream consumers
apps/sim/tools/google_forms/utils.ts Extends buildListResponsesUrl with pageToken and filter; both params are appended independently, which could produce ambiguous requests when combined with an opaque pageToken
apps/sim/blocks/blocks/google_forms.ts Adds pageToken and filter block inputs (advanced mode) with correct wandConfig for filter; nextPageToken output wired correctly; pageSize visibility updated to user-or-llm
apps/sim/tools/google_bigquery/get_table.ts Marks type, numRows, numBytes, creationTime, lastModifiedTime, location as optional; correct for views and external tables
scripts/generate-docs.ts Fixes regex to correctly distinguish single/double/backtick quote delimiters so apostrophes in descriptions are no longer truncated; adds per-tool params scoping and directory scan for tools defined in sibling files
apps/sim/lib/integrations/integrations.json Regenerated; adds datagma, dropcontact, enrow, icypeas, leadmagic entries and improves truncated/stub descriptions across multiple integrations

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant User
    participant Block as Google Forms Block
    participant Tool as get_responses Tool
    participant API as Google Forms API

    User->>Block: execute(formId, pageSize, filter, pageToken?)
    Block->>Tool: "params {formId, pageSize, filter, pageToken}"

    alt responseId provided
        Tool->>API: "GET /forms/{formId}/responses/{responseId}"
        API-->>Tool: FormResponse
        Tool-->>Block: "{response, raw}"
    else list mode
        Tool->>API: "GET /forms/{formId}/responses?pageSize=N&filter=...&pageToken=..."
        API-->>Tool: "{responses[], nextPageToken?}"
        Tool->>Tool: sort + normalizeResponse()
        Tool-->>Block: "{responses[], nextPageToken (null if last page), raw}"
    end

    Block-->>User: output
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant User
    participant Block as Google Forms Block
    participant Tool as get_responses Tool
    participant API as Google Forms API

    User->>Block: execute(formId, pageSize, filter, pageToken?)
    Block->>Tool: "params {formId, pageSize, filter, pageToken}"

    alt responseId provided
        Tool->>API: "GET /forms/{formId}/responses/{responseId}"
        API-->>Tool: FormResponse
        Tool-->>Block: "{response, raw}"
    else list mode
        Tool->>API: "GET /forms/{formId}/responses?pageSize=N&filter=...&pageToken=..."
        API-->>Tool: "{responses[], nextPageToken?}"
        Tool->>Tool: sort + normalizeResponse()
        Tool-->>Block: "{responses[], nextPageToken (null if last page), raw}"
    end

    Block-->>User: output
Loading

Reviews (2): Last reviewed commit: "fix(docs-gen): resolve tools defined in ..." | Re-trigger Greptile

…ing docs

The doc generator extracted tool descriptions with a character class that
excluded both quote types (['"]([^'"]...)['"]), so a double-quoted description
containing an apostrophe (e.g. "Find someone's email") was truncated at the
apostrophe — the generated docs/catalog showed stubs like "Find someone".

Anchor extraction on the actual opening quote (single/double/backtick), matching
the existing extractDescription helper, in both buildToolDescriptionMap and
extractToolInfo. Regenerated docs restore full descriptions across all affected
integrations (Apollo, Ahrefs, LeadMagic, Findymail, OpenAI, Slack, etc.).
… per tool

The doc generator located a tool's definition only by filename convention
(decompress.ts / index.ts), so file_decompress — which lives in compress.ts
alongside file_compress — fell back to index.ts and rendered an empty Input
table. It also read the params block from the first tool in a multi-tool file,
so every tool in such a file inherited the first tool's inputs/outputs.

- getToolInfo: when no candidate file declares the exact tool ID, scan the whole
  tool-prefix directory for the file that does.
- extractToolInfo: read the params block scoped to the specific tool, falling
  back to the full file for tools that inherit params via spread.

Regenerated docs eliminate ~50 empty/incorrect input tables across integrations
(clickhouse, rb2b, reddit, file, etc.); param-less OAuth-only tools correctly
keep an empty input table.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

Thanks @greptile — both findings were real and are now fixed at the root cause (the docs generator), not patched in the generated output.

1. file_decompress empty Input tablefile_decompress is defined in compress.ts (alongside file_compress), but the generator located a tool's definition only by filename convention (decompress.ts / index.ts), so it fell back to index.ts and surfaced no params. It also read the params block from the first tool in a multi-tool file. Fixed in getToolInfo (scan the whole tool-prefix dir for the file declaring the exact tool ID) and extractToolInfo (scope the params block to the specific tool, falling back to the full file for spread-inherited params). file_decompress now documents fileId/fileInput and its own output. This also eliminated ~50 empty/incorrect input tables across other multi-tool files (clickhouse 20→0, rb2b 15→0, reddit 9→2; genuinely param-less OAuth-only tools correctly keep an empty table). Commit 689cae6.

2. Stub descriptions ("Find someone" / "Find a person") — these weren't placeholder source text; the tool descriptions are full sentences (e.g. "Find someone's verified work email…"). The generator's description regex used a character class excluding both quote types, so a double-quoted description containing an apostrophe was truncated at the apostrophe. Fixed by anchoring extraction on the actual opening quote (matching the existing extractDescription helper) in both buildToolDescriptionMap and extractToolInfo; descriptions are now full across all affected integrations (LeadMagic, Findymail, Apollo, Slack, OpenAI, etc.). Commit 701870e.

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile review

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

@greptile-apps

greptile-apps Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Both fixes are the right call — patching the generator rather than the generated output means all future regenerations are correct by default.

The extractToolInfo scoping fix for multi-tool files is particularly impactful; falling back to the first tool's params block silently produced wrong docs for any co-located tool, and the ~50 empty/incorrect tables you mention confirms the blast radius was larger than just file_decompress.

The regex fix for apostrophe-truncated descriptions is also clean — anchoring on the actual opening quote character rather than a character class that excluded both quote types is the correct approach. Good that it's consistent between buildToolDescriptionMap and extractToolInfo.

Thanks for the detailed explanation of both commits. No further concerns here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 689cae6. Configure here.

@waleedlatif1 waleedlatif1 merged commit 8b93e43 into staging Jun 17, 2026
16 checks passed
@waleedlatif1 waleedlatif1 deleted the validate/lighthouse-integration branch June 17, 2026 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant