feat: fetch results beyond the old 50k row cap by jpetey75 · Pull Request #25 · lightdash/python-sdk

jpetey75 · 2026-06-17T02:25:05Z

Closes #19

⚠️ Stacked on #22 (base branch is feat/multiple-filters-same-field, not main). Review/merge #22 first, then I'll retarget to main. Diff here is scoped to #19 only.

Problem

Query.execute() hard-rejected any limit above 50,000:

if not 1 <= self._limit <= 50000:
    raise ValueError("Limit must be between 1 and 50000")

But 50k was never the real API cap. Verification against Lightdash source + the live instance showed the actual ceiling is the server's configurable query.maxLimit (default 5000, Cloud: 100000), exposed via /api/v1/health. 50000 is just a stale number (it's a CLI default elsewhere in Lightdash).

So the old check both blocked valid queries (50k–100k) and, had we naively "removed the cap", would have let the server silently clamp and return a truncated result — the worst outcome for data extraction.

Change

Client.get_query_limits() — discovers the instance's query config (maxLimit, maxPageSize, …) from /api/v1/health, cached per client.
Query.execute() rejects only limit < 1 locally, then validates larger limits against the discovered maxLimit, raising a clear ValueError rather than letting the server silently truncate. Fails open (lets the server enforce) if /health is unreachable.
Efficient paging — large fetches now page at the instance's maxPageSize (bounded by the requested limit) instead of a hard-coded 500, cutting round-trips ~5×.
Fixed a latent row-skip bug — pagination now derives the page count from total_results and a single uniform page size, so changing page size can never skip or duplicate rows.

# Now works (was rejected): up to the instance's real maxLimit
result = model.query().metrics(model.metrics.revenue).limit(100_000).execute()

# Asking for more than the instance allows fails loudly, not silently
model.query().limit(10_000_000).execute()
# ValueError: Limit 10000000 exceeds this instance's maximum query limit of 100000...

Scope

Per discussion, this respects the query endpoint's maxLimit. Pulling result sets larger than that (millions of rows) needs the separate CSV/download-export endpoint — I'll open a follow-up issue for that.

Verification

query.maxLimit = 100000, maxPageSize = 2500 confirmed from the live instance /api/v1/health.
The "50k was not the real cap" claim is grounded in that server config (100k > 50k) and the Lightdash source.
Caveat: a true end-to-end >50k fetch isn't exercised here — the SDK needs an API token I don't have in this environment, and the MCP query tool has its own separate copilot row-cap so it's not a faithful proxy. The limit values are confirmed live; the behaviour is unit-tested with a fake client.

Tests

tests/test_row_limit.py (13 new): below-1 rejection, over-maxLimit rejection (no query submitted), 50k–100k now allowed, fail-open when /health errors, page size = maxPageSize bounded by limit, and pagination consistency (all rows returned exactly once, every page fetched at the same size). Full unit suite green (90 passed).

🤖 Generated with Claude Code

The SDK hard-rejected any limit above 50,000, but 50k was never the real API cap — it was a stale number (the actual cap is the server's configurable `query.maxLimit`, e.g. 100,000 on Cloud, exposed via /api/v1/health). - `Client.get_query_limits()` discovers the instance's `query` config (maxLimit, maxPageSize, ...) from /api/v1/health, cached per client. - `Query.execute()` now rejects only limits < 1 locally, and validates larger limits against the discovered `maxLimit` — raising a clear ValueError instead of letting the server silently clamp and return a truncated result. - Large fetches page at the instance's `maxPageSize` (bounded by the requested limit) instead of a hard-coded 500, cutting round-trips ~5x. - Pagination now derives the page count from `total_results` and a single uniform page size, fixing a latent row-skip bug when a non-default page size was used. maxLimit / maxPageSize values verified against the live instance /health. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

jpetey75 mentioned this pull request Jun 17, 2026

Support fetching very large extracts via CSV/download export (beyond query.maxLimit) #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fetch results beyond the old 50k row cap#25

feat: fetch results beyond the old 50k row cap#25
jpetey75 wants to merge 1 commit into
feat/multiple-filters-same-fieldfrom
feat/fetch-beyond-50k

jpetey75 commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jpetey75 commented Jun 17, 2026

Problem

Change

Scope

Verification

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant