python - How to outer joinmerge two frames with polars while updating left with right values? - Stack Overflow

IT技术

更新时间：2025-03-060

admin管理员组
文章数量:1279124

So I got two csv which I load as polars frames:

left:

track_name,type,yield,group
8CEB45v1,corn,0.146957,A
A188v2,corn,0.86308,A
B73v6,corn,0.326076,A
CI6621v1,sweetcorn,0.0357792,A
CML103v1,sweetcorn,0.510464,A

right:

track_name,type,yield,group
8CEB45v1,corn,0.999,A
B1234,pepper,1,B
B1235,pepper,2,B

my code so far:

import polars as pl

left = pl.read_csv("left.csv")
right = pl.read_csv("right.csv")

matching_columns = list(set(left.columns) & set(right.columns))  # I do this since I want to join sometimes frame which does not have a 100 % column match. In that case I want to simply add the new columns to the outer frame.

outer = left.join(
            right,
            on=matching_columns,
            how="outer",
            coalesce=True,
            maintain_order="left",
        )
outer

my result:

shape: (8, 4)
┌────────────┬───────────┬───────────┬───────┐
│ track_name ┆ type      ┆ yield     ┆ group │
│ ---        ┆ ---       ┆ ---       ┆ ---   │
│ str        ┆ str       ┆ f64       ┆ str   │
╞════════════╪═══════════╪═══════════╪═══════╡
│ 8CEB45v1   ┆ corn      ┆ 0.146957  ┆ A     │
│ A188v2     ┆ corn      ┆ 0.86308   ┆ A     │
│ B73v6      ┆ corn      ┆ 0.326076  ┆ A     │
│ CI6621v1   ┆ sweetcorn ┆ 0.0357792 ┆ A     │
│ CML103v1   ┆ sweetcorn ┆ 0.510464  ┆ A     │
│ B1234      ┆ pepper    ┆ 1.0       ┆ B     │
│ B1235      ┆ pepper    ┆ 2.0       ┆ B     │
│ 8CEB45v1   ┆ corn      ┆ 0.999     ┆ A     │
└────────────┴───────────┴───────────┴───────┘

my desired output: (yield of 8CEB45v1 from right updates value of left)

shape: (7, 4)
┌────────────┬───────────┬───────────┬───────┐
│ track_name ┆ type      ┆ yield     ┆ group │
│ ---        ┆ ---       ┆ ---       ┆ ---   │
│ str        ┆ str       ┆ f64       ┆ str   │
╞════════════╪═══════════╪═══════════╪═══════╡
│ 8CEB45v1   ┆ corn      ┆ 0.999     ┆ A     │
│ A188v2     ┆ corn      ┆ 0.86308   ┆ A     │
│ B73v6      ┆ corn      ┆ 0.326076  ┆ A     │
│ CI6621v1   ┆ sweetcorn ┆ 0.0357792 ┆ A     │
│ CML103v1   ┆ sweetcorn ┆ 0.510464  ┆ A     │
│ B1234      ┆ pepper    ┆ 1.0       ┆ B     │
│ B1235      ┆ pepper    ┆ 2.0       ┆ B     │
└────────────┴───────────┴───────────┴───────┘

So I got two csv which I load as polars frames:

left:

track_name,type,yield,group
8CEB45v1,corn,0.146957,A
A188v2,corn,0.86308,A
B73v6,corn,0.326076,A
CI6621v1,sweetcorn,0.0357792,A
CML103v1,sweetcorn,0.510464,A

right:

track_name,type,yield,group
8CEB45v1,corn,0.999,A
B1234,pepper,1,B
B1235,pepper,2,B

my code so far:

import polars as pl

left = pl.read_csv("left.csv")
right = pl.read_csv("right.csv")

matching_columns = list(set(left.columns) & set(right.columns))  # I do this since I want to join sometimes frame which does not have a 100 % column match. In that case I want to simply add the new columns to the outer frame.

outer = left.join(
            right,
            on=matching_columns,
            how="outer",
            coalesce=True,
            maintain_order="left",
        )
outer

my result:

shape: (8, 4)
┌────────────┬───────────┬───────────┬───────┐
│ track_name ┆ type      ┆ yield     ┆ group │
│ ---        ┆ ---       ┆ ---       ┆ ---   │
│ str        ┆ str       ┆ f64       ┆ str   │
╞════════════╪═══════════╪═══════════╪═══════╡
│ 8CEB45v1   ┆ corn      ┆ 0.146957  ┆ A     │
│ A188v2     ┆ corn      ┆ 0.86308   ┆ A     │
│ B73v6      ┆ corn      ┆ 0.326076  ┆ A     │
│ CI6621v1   ┆ sweetcorn ┆ 0.0357792 ┆ A     │
│ CML103v1   ┆ sweetcorn ┆ 0.510464  ┆ A     │
│ B1234      ┆ pepper    ┆ 1.0       ┆ B     │
│ B1235      ┆ pepper    ┆ 2.0       ┆ B     │
│ 8CEB45v1   ┆ corn      ┆ 0.999     ┆ A     │
└────────────┴───────────┴───────────┴───────┘

my desired output: (yield of 8CEB45v1 from right updates value of left)

shape: (7, 4)
┌────────────┬───────────┬───────────┬───────┐
│ track_name ┆ type      ┆ yield     ┆ group │
│ ---        ┆ ---       ┆ ---       ┆ ---   │
│ str        ┆ str       ┆ f64       ┆ str   │
╞════════════╪═══════════╪═══════════╪═══════╡
│ 8CEB45v1   ┆ corn      ┆ 0.999     ┆ A     │
│ A188v2     ┆ corn      ┆ 0.86308   ┆ A     │
│ B73v6      ┆ corn      ┆ 0.326076  ┆ A     │
│ CI6621v1   ┆ sweetcorn ┆ 0.0357792 ┆ A     │
│ CML103v1   ┆ sweetcorn ┆ 0.510464  ┆ A     │
│ B1234      ┆ pepper    ┆ 1.0       ┆ B     │
│ B1235      ┆ pepper    ┆ 2.0       ┆ B     │
└────────────┴───────────┴───────────┴───────┘

Share Improve this question edited Feb 24 at 17:53 jqurious 21.6k4 gold badges20 silver badges39 bronze badges asked Feb 24 at 17:51 Pm740 4233 silver badges14 bronze badges

There is a request for a similar "outer update" github/pola-rs/polars/issues/17509 + their question stackoverflow/questions/78707931 - But in this case, how do we know what columns should be updated? What if 8CEB45v1 had multiple entries in left that match right? – jqurious Commented Feb 25 at 14:27

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

The thing you're doing wrong is including yield in matching_columns. You don't want to match by it, you want it as a value.

One idea to reconcile that would be

matching_columns = list(set(left.select(pl.col(pl.String)).columns) & 
                        set(right.select(pl.col(pl.String)).columns))

Alternatively, you could start with your way but then remove f64 columns. It really just depends on your data.

matching_columns = set(left.columns) & set(right.columns)
matching_columns -= (
set(left.select(matching_columns).select(pl.col(pl.Float64)).columns)
)

Once you have your matching_columns established, you can use the built in update:

left.update(right, on=matching_columns, how="full")

shape: (7, 4)
┌────────────┬───────────┬───────────┬───────┐
│ track_name ┆ type      ┆ yield     ┆ group │
│ ---        ┆ ---       ┆ ---       ┆ ---   │
│ str        ┆ str       ┆ f64       ┆ str   │
╞════════════╪═══════════╪═══════════╪═══════╡
│ 8CEB45v1   ┆ corn      ┆ 0.999     ┆ A     │
│ A188v2     ┆ corn      ┆ 0.86308   ┆ A     │
│ B73v6      ┆ corn      ┆ 0.326076  ┆ A     │
│ CI6621v1   ┆ sweetcorn ┆ 0.0357792 ┆ A     │
│ CML103v1   ┆ sweetcorn ┆ 0.510464  ┆ A     │
│ B1235      ┆ pepper    ┆ 2.0       ┆ B     │
│ B1234      ┆ pepper    ┆ 1.0       ┆ B     │
└────────────┴───────────┴───────────┴───────┘

~~Note: you don't have to convert the python set into a list, polars is happy to accept a set as input.~~ Polars will accept a set at runtime but the type checker will complain about because polars has Sequence as the annotation. If you're using join with on then this won't really matter. If you're using left_on and right_on then it might get you in trouble because sets don't maintain order and the order of each of those inputs is how they're used.

Response to comment.

If you have extra columns in right that you want the output to keep then you're better off doing the update yourself with a join and a coalesce. I say that because the update function is doing that under the hood but is dropping those extra columns. It's probably worth a feature request that update provide an option to keep extra columns in right.

Anyways, here's the code

New setup

left = pl.DataFrame(
    [
        pl.Series('track_name',['8CEB45v1','A188v2','B73v6','CI6621v1','CML103v1']),
        pl.Series('type',['corn','corn','corn','sweetcorn', 'sweetcorn']),
        pl.Series('yield', [0.146957,0.86308,0.326076,0.0357792,0.510464]),
        pl.Series('group',['A','A','A','A','A']),
    ]
)


right = pl.DataFrame(
    [
        pl.Series('track_name',['8CEB45v1','B1234','B1235']),
        pl.Series('type',['corn','pepper','pepper']),
        pl.Series('yield',[0.999,1.0,2.0]),
        pl.Series('group',['A','B','B']),
        pl.Series("fruit",["apple","banana","carrot"])
    ]
)

the work

common_columns = set(left.columns) & set(right.columns)

join_columns = common_columns - set(left.select(pl.col(pl.Float64)).columns)

update_columns = common_columns - join_columns

extra_right_columns = set(right.columns) - common_columns

(
    left
    .join(right, on=list(join_columns), how="full", coalesce=True)
    .select(
        *join_columns,
        *[pl.coalesce(f"{name}_right", name).alias(name)
        for name in update_columns],
        *extra_right_columns
    )
)
shape: (7, 5)
┌───────────┬───────┬────────────┬───────────┬────────┐
│ type      ┆ group ┆ track_name ┆ yield     ┆ fruit  │
│ ---       ┆ ---   ┆ ---        ┆ ---       ┆ ---    │
│ str       ┆ str   ┆ str        ┆ f64       ┆ str    │
╞═══════════╪═══════╪════════════╪═══════════╪════════╡
│ corn      ┆ A     ┆ 8CEB45v1   ┆ 0.999     ┆ apple  │
│ corn      ┆ A     ┆ A188v2     ┆ 0.86308   ┆ null   │
│ corn      ┆ A     ┆ B73v6      ┆ 0.326076  ┆ null   │
│ sweetcorn ┆ A     ┆ CI6621v1   ┆ 0.0357792 ┆ null   │
│ sweetcorn ┆ A     ┆ CML103v1   ┆ 0.510464  ┆ null   │
│ pepper    ┆ B     ┆ B1235      ┆ 2.0       ┆ carrot │
│ pepper    ┆ B     ┆ B1234      ┆ 1.0       ┆ banana │
└───────────┴───────┴────────────┴───────────┴────────┘

本文标签： pythonHow to outer joinmerge two frames with polars while updating left with right valuesStack Overflow

版权声明：本文标题：python - How to outer joinmerge two frames with polars while updating left with right values? - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1741249697a2365562.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

python - How to outer joinmerge two frames with polars while updating left with right values? - Stack Overflow

1 Answer 1

Response to comment.

New setup

the work

更多相关文章

python - How to outer joinmerge two frames with polars while updating left with right values? - Stack Overflow

发表评论

推荐文章

javascript - Change language to Bootstrap Datetimepicker - Stack Overflow

security - Understanding Keycloak&#39;s default client scope &quot;web-origins&quot; - Stack Overflow

javascript - How can I change the attrs of a custom object in JointJS? - Stack Overflow

javascript - Using spreadsheet values in html body of email with GmailApp.sendEmail - Stack Overflow

javascript - How do I detect if there are scrollbars on a browser window? - Stack Overflow

热门文章

javascript - Error handling in a Promise.all like flow in asyncawait syntax - Stack Overflow

javascript - Module not found: Error: You attempted to import babel-preset which falls outside of the project src directory - St

meta query - meta_query compare “=“ returns nothing when it seems it should

ios - SKTestSession promotional Offer not working - Stack Overflow

windows - Scheduled task not detecting dirty reboots - Stack Overflow

Terraform dynamic block to be executed with default values - Stack Overflow

c++ - How can I transparently process std::vector of T and std::vector of std::shared_ptr&lt;T&gt; in a template? - Stac

flutter - How to convert this complex Future returning function to a Stream? - Stack Overflow

WP Plugin Developers - Accessing my wordpress site

javascript - How to add two arrays in pairwise fashion - Stack Overflow

最新文章

Win7各正式版下载地址和SHA验证

怎么样把中文版的Windows7改成英文版的Windows7

Win7系统笔记本蓝牙打开指南：详细步骤助你轻松连接

win7开机弹计算机,win7开机弹出Windows Installer窗口的解决方法

windows7虚拟机安装vmtools方法

primeng - Error on adding Prime NG button module to my Angular 19 project (An unhandled exception occurred: fn is not a function

javascript - Returning the line of a searched text from a file using node.js - Stack Overflow

Pull MySQL data from multiple tables and merge into 1 PHP array

How does naming an anonymous function in JavaScript make a difference? - Stack Overflow

javascript - What is the the most efficient way of endering an MJPEG-Stream in browser - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

security - Understanding Keycloak's default client scope "web-origins" - Stack Overflow

c++ - How can I transparently process std::vector of T and std::vector of std::shared_ptr<T> in a template? - Stac