Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type Coercion fails for List with inner type struct which has large/view types #14154

Open
ion-elgreco opened this issue Jan 16, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@ion-elgreco
Copy link

ion-elgreco commented Jan 16, 2025

Describe the bug

A LargeList(Struct({"foo": LargeUtf8})) cannot be coerced to List(Struct({"foo": Utf8})). It however it works fine for LargeList(LargeUtf8) -> List(Utf8) and Struct({"foo": LargeUtf8}) -> Struct({"foo": Utf8}).

To Reproduce

import polars as pl
from deltalake import DeltaTable

tmp_path = "test_table__"
df = pl.DataFrame({"foo": [1], "bar": [[{"foo": "!"}]]})
df.write_delta(tmp_path, mode="overwrite", overwrite_schema=True)

DeltaTable(tmp_path).merge(
    df.to_arrow(compat_level=1),
    predicate="s.foo = t.foo",
    source_alias="s",
    target_alias="t",
    large_dtypes=None,
).when_matched_update_all().execute()
DeltaError: Generic DeltaTable error: type_coercion
caused by
Error during planning: Failed to coerce then ([LargeList(Field { name: "item", data_type: Struct([Field { name: "foo", data_type: Utf8View, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "element", data_type: Struct([Field { name: "foo", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "element", data_type: Struct([Field { name: "foo", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "element", data_type: Struct([Field { name: "foo", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })]) and else (None) to common types in CASE WHEN expression

Expected behavior

Be able to coerce Large/view and normal arrow types in deeply nested types.

Additional context

Luckly we still can downcast in python using the large_dtypes=False, but datafusion should be able to coerce any deeply nested dtype.

@ion-elgreco ion-elgreco added the bug Something isn't working label Jan 16, 2025
@kosiew
Copy link
Contributor

kosiew commented Jan 17, 2025

I tested this in a fork of the deltalake repo but could not reproduce the error:

DeltaLake Version: 0.23.0
Polars Version: 1.20.0

@ion-elgreco
Copy link
Author

@kosiew you are testing it against an older version of deltalake from what I can see in the commit: https://github.com/delta-io/delta-rs/blob/d8080b13f5724aa09fb268b17f507dfd8559255f/python/pyproject.toml

In that commit you can see it was Datafusion v43, we are now at datafusion v44.

@kosiew
Copy link
Contributor

kosiew commented Jan 17, 2025

Good catch @ion-elgreco ☝!
I managed to reproduce the error after fetching the latest main from the upstream.

I investigated the delta-rs repo, because I earlier tested the coercion in datafusion v44, but could not trigger the error:

https://github.com/kosiew/datafusion/blob/f845a233cb90758b42c0ab2d3ea1c1e02da36aa3/datafusion/optimizer/src/analyzer/type_coercion.rs#L2140-L2199

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants