feat: support CLUSTER BY AUTO and CLUSTER BY NONE for Databricks liquid clustering by EhabEasee · Pull Request #5846 · SQLMesh/sqlmesh

EhabEasee · 2026-06-17T15:39:40Z

Databricks supports two keyword forms of liquid clustering that don't take column arguments:

CLUSTER BY AUTO — lets Databricks automatically select clustering columns
CLUSTER BY NONE — disables liquid clustering on a table

Previously, SQLMesh had no way to express these in a model definition. This PR adds support for both.

Changes

constants.py: Adds LIQUID_CLUSTERING_KEYWORDS = frozenset({"AUTO", "NONE"}) as a shared constant used across the parser, validator, and adapter.

Parsing (dialect.py): The clustered_by property parser now recognises bare AUTO and NONE tokens (unquoted VAR tokens) as liquid clustering keywords rather than column references. Backtick-quoted `auto` / `none` are still treated as regular column names, preserving backwards compatibility for columns that happen to share those names.

Validation (meta.py): A single string passed to clustered_by is normalised to a list before processing. The validator then skips the column-count check for exp.Var(AUTO|NONE), but only when the field is clustered_by and the dialect is databricks. On deserialisation from JSON, keyword strings are restored to exp.Var sentinels before list_of_fields_validator can normalise them into quoted columns.

Validation (definition.py): The validate_definition column-existence check skips keyword sentinels for the same clustered_by + databricks scope.

Code generation (databricks.py): _build_table_properties_exp detects a single exp.Var in clustered_by (guarded by a ValueError if the Var holds an unexpected value), and emits CLUSTER BY AUTO / CLUSTER BY NONE without wrapping in a tuple. Multi-column paths are unchanged.

Usage

-- In a SQLMesh model definition
MODEL (
  name my_catalog.my_schema.my_table,
  kind FULL,
  dialect databricks,
  clustered_by AUTO
);

MODEL (
  name my_catalog.my_schema.my_table,
  kind FULL,
  dialect databricks,
  clustered_by NONE
);

Via the Python API, both a plain string and exp.Var are accepted:

create_sql_model(..., dialect="databricks", clustered_by="AUTO")
create_sql_model(..., dialect="databricks", clustered_by=exp.Var(this="AUTO"))

Columns with the names auto or none are still supported via backtick quoting:

MODEL (
  name my_catalog.my_schema.my_table,
  kind FULL,
  dialect databricks,
  clustered_by (`auto`, `none`)
);

Tests

tests/core/test_dialect.py — parser round-trips: AUTO/NONE keywords, backtick-quoted columns, paren-wrapped single columns, multi-column lists, mixed list (a, AUTO), non-Databricks dialect
tests/core/test_model.py — model DDL; Python API with both exp.Var and plain string; backtick-quoted column names; render_definition output; JSON serialisation round-trip; non-Databricks dialect rejection; mixed-list column treatment
tests/core/engine_adapter/test_databricks.py — adapter emits CLUSTER BY AUTO / CLUSTER BY NONE without column parens

…id clustering Adds parser, validator, and Databricks adapter support for the keyword forms of liquid clustering. Bare AUTO/NONE (unquoted VAR tokens) are recognised as keywords; backtick-quoted `auto`/`none` and parenthesised forms remain real column references. - Add LIQUID_CLUSTERING_KEYWORDS constant to avoid repeating the sentinel set across dialect, meta, definition, and adapter - Parser (dialect.py): detect VAR-token AUTO/NONE on clustered_by; strip Paren from single-column clustered_by to match partitioned_by normalisation - Validator (meta.py): normalise single string input to list; restore keyword sentinels from JSON strings on deserialisation; skip column-count check for keywords, gated on clustered_by + databricks - validate_definition (definition.py): skip keyword sentinels in the column-existence check, same gate - Adapter (databricks.py): emit CLUSTER BY AUTO / CLUSTER BY NONE without a tuple wrapper; raise ValueError on unexpected bare Var - Tests: parser round-trips, Python API (exp.Var and plain string), backtick-quoted columns, render_definition, JSON round-trip, non-Databricks rejection, mixed-list behaviour, adapter SQL emission Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: EhabEasee <ehab.elbadrawi@easee.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support CLUSTER BY AUTO and CLUSTER BY NONE for Databricks liquid clustering#5846

feat: support CLUSTER BY AUTO and CLUSTER BY NONE for Databricks liquid clustering#5846
EhabEasee wants to merge 1 commit into
SQLMesh:mainfrom
EhabEasee:feat/clustered-by-auto-none

EhabEasee commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EhabEasee commented Jun 17, 2026

Changes

Usage

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant