feat: support CLUSTER BY AUTO and CLUSTER BY NONE for Databricks liquid clustering#5846
Open
EhabEasee wants to merge 1 commit into
Open
feat: support CLUSTER BY AUTO and CLUSTER BY NONE for Databricks liquid clustering#5846EhabEasee wants to merge 1 commit into
EhabEasee wants to merge 1 commit into
Conversation
…id clustering Adds parser, validator, and Databricks adapter support for the keyword forms of liquid clustering. Bare AUTO/NONE (unquoted VAR tokens) are recognised as keywords; backtick-quoted `auto`/`none` and parenthesised forms remain real column references. - Add LIQUID_CLUSTERING_KEYWORDS constant to avoid repeating the sentinel set across dialect, meta, definition, and adapter - Parser (dialect.py): detect VAR-token AUTO/NONE on clustered_by; strip Paren from single-column clustered_by to match partitioned_by normalisation - Validator (meta.py): normalise single string input to list; restore keyword sentinels from JSON strings on deserialisation; skip column-count check for keywords, gated on clustered_by + databricks - validate_definition (definition.py): skip keyword sentinels in the column-existence check, same gate - Adapter (databricks.py): emit CLUSTER BY AUTO / CLUSTER BY NONE without a tuple wrapper; raise ValueError on unexpected bare Var - Tests: parser round-trips, Python API (exp.Var and plain string), backtick-quoted columns, render_definition, JSON round-trip, non-Databricks rejection, mixed-list behaviour, adapter SQL emission Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: EhabEasee <ehab.elbadrawi@easee.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Databricks supports two keyword forms of liquid clustering that don't take column arguments:
CLUSTER BY AUTO— lets Databricks automatically select clustering columnsCLUSTER BY NONE— disables liquid clustering on a tablePreviously, SQLMesh had no way to express these in a model definition. This PR adds support for both.
Changes
constants.py: AddsLIQUID_CLUSTERING_KEYWORDS = frozenset({"AUTO", "NONE"})as a shared constant used across the parser, validator, and adapter.Parsing (
dialect.py): Theclustered_byproperty parser now recognises bareAUTOandNONEtokens (unquotedVARtokens) as liquid clustering keywords rather than column references. Backtick-quoted `auto` / `none` are still treated as regular column names, preserving backwards compatibility for columns that happen to share those names.Validation (
meta.py): A single string passed toclustered_byis normalised to a list before processing. The validator then skips the column-count check forexp.Var(AUTO|NONE), but only when the field isclustered_byand the dialect isdatabricks. On deserialisation from JSON, keyword strings are restored toexp.Varsentinels beforelist_of_fields_validatorcan normalise them into quoted columns.Validation (
definition.py): Thevalidate_definitioncolumn-existence check skips keyword sentinels for the sameclustered_by+databricksscope.Code generation (
databricks.py):_build_table_properties_expdetects a singleexp.Varinclustered_by(guarded by aValueErrorif the Var holds an unexpected value), and emitsCLUSTER BY AUTO/CLUSTER BY NONEwithout wrapping in a tuple. Multi-column paths are unchanged.Usage
Via the Python API, both a plain string and
exp.Varare accepted:Columns with the names
autoornoneare still supported via backtick quoting:Tests
tests/core/test_dialect.py— parser round-trips:AUTO/NONEkeywords, backtick-quoted columns, paren-wrapped single columns, multi-column lists, mixed list(a, AUTO), non-Databricks dialecttests/core/test_model.py— model DDL; Python API with bothexp.Varand plain string; backtick-quoted column names;render_definitionoutput; JSON serialisation round-trip; non-Databricks dialect rejection; mixed-list column treatmenttests/core/engine_adapter/test_databricks.py— adapter emitsCLUSTER BY AUTO/CLUSTER BY NONEwithout column parens