Skip to content

[spark] Support SQL-defined UDFs for Spark 4#8299

Open
Zouxxyy wants to merge 4 commits into
apache:masterfrom
Zouxxyy:xinyu/paimon-sql-udf
Open

[spark] Support SQL-defined UDFs for Spark 4#8299
Zouxxyy wants to merge 4 commits into
apache:masterfrom
Zouxxyy:xinyu/paimon-sql-udf

Conversation

@Zouxxyy

@Zouxxyy Zouxxyy commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Purpose

Support SQL-defined scalar UDFs (CREATE FUNCTION ... RETURN) on Paimon catalog for Spark 4.

Intercept CreateUserDefinedFunction at the parser stage and persist the SQL
function in the Paimon catalog as a SQLFunctionDefinition; at query time rebuild a
Spark SQLFunctionExpression so Spark's own ResolveSQLFunctions / EliminateSQLFunctionNode
inline the body. No Spark changes required.

Scalar functions with an expression body or a single-column query body are
supported; table functions are rejected for now. All Spark-4-only code
(CreateUserDefinedFunction / SQLFunction / SQLFunctionExpression) is confined to
paimon-spark4-common behind SparkShim; the spark3 shim is a no-op so
paimon-spark-common still compiles against Spark 3.

Key changes:

  • new (spark4-common): SQLFunctionConverter, RewritePaimonSQLFunctionCommands, CreatePaimonSQLFunctionCommand
  • common: extract PaimonFunctionLookup; recognize SQLFunctionDefinition in the function-reference rewrite; route SQL functions through SparkShim in PaimonV1FunctionRegistry; append shim parser rules
  • rename: V1FunctionConverterFileFunctionConverter + FunctionIdentifierConverter (shared)

Tests

  • PaimonSQLFunctionTest (expression body, query body, OR REPLACE / IF NOT EXISTS, qualified names, DROP, table-function rejection) in spark-4.0 and spark-4.1

…on Paimon catalog for Spark 4

Intercept `CreateUserDefinedFunction` at the parser stage and persist the SQL
function in the Paimon catalog as a SQLFunctionDefinition; at query time rebuild a
Spark SQLFunctionExpression so Spark's own ResolveSQLFunctions / EliminateSQLFunctionNode
inline the body. No Spark changes required.

Scalar functions with an expression body or a single-column query body are
supported; table functions are rejected for now. All Spark-4-only code
(CreateUserDefinedFunction / SQLFunction / SQLFunctionExpression) is confined to
paimon-spark4-common behind SparkShim; the spark3 shim is a no-op so
paimon-spark-common still compiles against Spark 3.

- new (spark4-common): SQLFunctionConverter, RewritePaimonSQLFunctionCommands,
  CreatePaimonSQLFunctionCommand
- common: extract PaimonFunctionLookup; recognize SQLFunctionDefinition in the
  function-reference rewrite; route SQL functions through SparkShim in
  PaimonV1FunctionRegistry; append shim parser rules
- tests: PaimonSQLFunctionTest (expression body, query body, OR REPLACE /
  IF NOT EXISTS, qualified names, DROP, table-function rejection)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@JingsongLi

Copy link
Copy Markdown
Contributor

Cool!

Zouxxyy and others added 3 commits June 20, 2026 11:24
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JingsongLi

Copy link
Copy Markdown
Contributor

Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants