Skip to content

feat: fetch results beyond the old 50k row cap#25

Open
jpetey75 wants to merge 1 commit into
feat/multiple-filters-same-fieldfrom
feat/fetch-beyond-50k
Open

feat: fetch results beyond the old 50k row cap#25
jpetey75 wants to merge 1 commit into
feat/multiple-filters-same-fieldfrom
feat/fetch-beyond-50k

Conversation

@jpetey75

Copy link
Copy Markdown
Contributor

Closes #19

⚠️ Stacked on #22 (base branch is feat/multiple-filters-same-field, not main). Review/merge #22 first, then I'll retarget to main. Diff here is scoped to #19 only.

Problem

Query.execute() hard-rejected any limit above 50,000:

if not 1 <= self._limit <= 50000:
    raise ValueError("Limit must be between 1 and 50000")

But 50k was never the real API cap. Verification against Lightdash source + the live instance showed the actual ceiling is the server's configurable query.maxLimit (default 5000, Cloud: 100000), exposed via /api/v1/health. 50000 is just a stale number (it's a CLI default elsewhere in Lightdash).

So the old check both blocked valid queries (50k–100k) and, had we naively "removed the cap", would have let the server silently clamp and return a truncated result — the worst outcome for data extraction.

Change

  • Client.get_query_limits() — discovers the instance's query config (maxLimit, maxPageSize, …) from /api/v1/health, cached per client.
  • Query.execute() rejects only limit < 1 locally, then validates larger limits against the discovered maxLimit, raising a clear ValueError rather than letting the server silently truncate. Fails open (lets the server enforce) if /health is unreachable.
  • Efficient paging — large fetches now page at the instance's maxPageSize (bounded by the requested limit) instead of a hard-coded 500, cutting round-trips ~5×.
  • Fixed a latent row-skip bug — pagination now derives the page count from total_results and a single uniform page size, so changing page size can never skip or duplicate rows.
# Now works (was rejected): up to the instance's real maxLimit
result = model.query().metrics(model.metrics.revenue).limit(100_000).execute()

# Asking for more than the instance allows fails loudly, not silently
model.query().limit(10_000_000).execute()
# ValueError: Limit 10000000 exceeds this instance's maximum query limit of 100000...

Scope

Per discussion, this respects the query endpoint's maxLimit. Pulling result sets larger than that (millions of rows) needs the separate CSV/download-export endpoint — I'll open a follow-up issue for that.

Verification

  • query.maxLimit = 100000, maxPageSize = 2500 confirmed from the live instance /api/v1/health.
  • The "50k was not the real cap" claim is grounded in that server config (100k > 50k) and the Lightdash source.
  • Caveat: a true end-to-end >50k fetch isn't exercised here — the SDK needs an API token I don't have in this environment, and the MCP query tool has its own separate copilot row-cap so it's not a faithful proxy. The limit values are confirmed live; the behaviour is unit-tested with a fake client.

Tests

tests/test_row_limit.py (13 new): below-1 rejection, over-maxLimit rejection (no query submitted), 50k–100k now allowed, fail-open when /health errors, page size = maxPageSize bounded by limit, and pagination consistency (all rows returned exactly once, every page fetched at the same size). Full unit suite green (90 passed).

🤖 Generated with Claude Code

The SDK hard-rejected any limit above 50,000, but 50k was never the real API
cap — it was a stale number (the actual cap is the server's configurable
`query.maxLimit`, e.g. 100,000 on Cloud, exposed via /api/v1/health).

- `Client.get_query_limits()` discovers the instance's `query` config
  (maxLimit, maxPageSize, ...) from /api/v1/health, cached per client.
- `Query.execute()` now rejects only limits < 1 locally, and validates larger
  limits against the discovered `maxLimit` — raising a clear ValueError instead
  of letting the server silently clamp and return a truncated result.
- Large fetches page at the instance's `maxPageSize` (bounded by the requested
  limit) instead of a hard-coded 500, cutting round-trips ~5x.
- Pagination now derives the page count from `total_results` and a single
  uniform page size, fixing a latent row-skip bug when a non-default page size
  was used.

maxLimit / maxPageSize values verified against the live instance /health.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant