Skip to content

Upsert fails after update_schema().union_by_name() due to schema mismatch #3105

@Saamu192

Description

@Saamu192

Apache Iceberg version

0.10.0

Please describe the bug 🐞

When performing an upsert operation after adding a new column via update_schema().union_by_name() , the operation fails with a ValueError indicating that the schema field names don't match.

To reproduce:

from pyiceberg.catalog import load_catalog
import polars as pl

catalog = load_catalog("default", **{"type": "in-memory"})

df = pl.DataFrame(
    [
        {"id": 1, "name": "Alice", "age": 30, "city": "São Paulo"},
        {"id": 2, "name": "Bob", "age": 25, "city": "Rio de Janeiro"},
        {"id": 3, "name": "Carol", "age": 35, "city": "Belo Horizonte"},
        {"id": 4, "name": "David", "age": 28, "city": "Curitiba"},
    ]
)

arrow = df.to_arrow()

catalog.create_namespace_if_not_exists("default")
catalog.create_table_if_not_exists("default.my_table", arrow.schema)
table = catalog.load_table("default.my_table")

try:
    table.append(arrow)
    
    # Add a new column
    arrow = df.with_columns(ping=pl.lit("pong")).to_arrow()
    
    # Update schema to include the new column
    with table.update_schema() as update_schema:
        update_schema.union_by_name(arrow.schema)
        table = table.refresh()
    
    # This fails with ValueError
    table.upsert(arrow, ["id"])
finally:
    catalog.drop_table("default.my_table")

Error:
ValueError: Target schema's field names are not matching the table's field names: ['id', 'name', 'age', 'city', 'ping'], ['id', 'name', 'age', 'city']

Stack trace:

  File "pyiceberg/table/__init__.py", line 1343, in upsert
    return tx.upsert(
  File "pyiceberg/table/__init__.py", line 825, in upsert
    rows_to_update = upsert_util.get_rows_to_update(df, rows, join_cols)
  File "pyiceberg/table/upsert_util.py", line 92, in get_rows_to_update
    source_table.cast(target_table.schema)
  File "pyarrow/table.pxi", line 4721, in pyarrow.lib.Table.cast

Expected:
The upsert operation should succeed after the schema has been updated to include the new column.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions