pandas - Nearest matching for groups, when merge_asof fails because of warning that left frame is not properly sorted - Stack Ov

IT技术

更新时间：2025-01-115

admin管理员组
文章数量:1123798

This should be simple using pandas merge_asof function, but unfortunately it's not working because the function complains: ValueError: left keys must be sorted.

I want to assign the correct values of seniority of individuals to their achievements over time. I have a data frame with the achievements of 3,500 people over time. In total, there are > 75,000 achievements over a time period of 1970 to this year. Now, the individuals progress in seniority over time.

I want to match the achievements to their seniority.

Below, under the heading ACHIEVEMENTDATA is an example of data for two people. The relevant identifiers are identifier and achievement_year. achieve_count is the number of achievements by achievement_year for a person.

Now, I want a column (seniority) added to dfa based on achievement data over time from df (see PERSONALNDATA below), such that the rows in dfa reflect the proper seniority for each row in dfa. I manually did that below.

Note that some rows in df have duplicates identifier per year (A in 2015 and 2019) in these cases, rely on the row with the highest value of seniority.

PERSONALNDATA (df)
identifier  seniority   year
A   2   2009
A   3   2015
A   3   2015
A   4   2019
A   4   2019
A   4   2023
B   2   2012
B   4   2024


ACHIEVEMENTDATA (dfa):
identifier  achievement_year    achieve_count   seniority
A   2003    2   
A   2004    3   
A   2005    1   
A   2006    3   
A   2007    1   
A   2008    1   
A   2010    2   2
A   2011    2   2
A   2012    2   2
A   2013    4   2
A   2014    8   2
A   2015    4   3
A   2016    4   3
A   2017    4   3
A   2018    7   3
A   2019    4   4
A   2020    12  4
A   2021    8   4
A   2022    5   4
A   2023    7   4
A   2024    5   4
B   2007    1   
B   2009    1   
B   2010    2   
B   2011    1   
B   2012    2   2
B   2013    1   2
B   2014    1   2
B   2017    3   2
B   2019    1   2
B   2020    2   2
B   2021    1   2
B   2023    2   2
B   2024    2   4

This should be simple using pandas merge_asof function, but unfortunately it's not working because the function complains: ValueError: left keys must be sorted.

I want to match the achievements to their seniority.

Note that some rows in df have duplicates identifier per year (A in 2015 and 2019) in these cases, rely on the row with the highest value of seniority.

PERSONALNDATA (df)
identifier  seniority   year
A   2   2009
A   3   2015
A   3   2015
A   4   2019
A   4   2019
A   4   2023
B   2   2012
B   4   2024


ACHIEVEMENTDATA (dfa):
identifier  achievement_year    achieve_count   seniority
A   2003    2   
A   2004    3   
A   2005    1   
A   2006    3   
A   2007    1   
A   2008    1   
A   2010    2   2
A   2011    2   2
A   2012    2   2
A   2013    4   2
A   2014    8   2
A   2015    4   3
A   2016    4   3
A   2017    4   3
A   2018    7   3
A   2019    4   4
A   2020    12  4
A   2021    8   4
A   2022    5   4
A   2023    7   4
A   2024    5   4
B   2007    1   
B   2009    1   
B   2010    2   
B   2011    1   
B   2012    2   2
B   2013    1   2
B   2014    1   2
B   2017    3   2
B   2019    1   2
B   2020    2   2
B   2021    1   2
B   2023    2   2
B   2024    2   4

Share Improve this question asked yesterday Martien Lubberink 2,7251 gold badge22 silver badges34 bronze badges

Add a comment |

2 Answers 2

Sorted by: Reset to default 0

the second dfa looks very unorganised, let us sort that first

df = df.sort_values(by=['identifier', 'year'])
dfa = dfa.sort_values(by=['identifier', 'achievement_year'])

looks like there are some dups too, lets try that too..

A   3   2015
A   3   2015
A   4   2019
A   4   2019

df = df.sort_values(by=['identifier', 'year']).drop_duplicates(subset=['identifier', 'year'], keep='last')

lets go with merge func

merged_df = pd.merge_asof(
    dfa,
    df,
    by='identifier',
    left_on='achievement_year',
    right_on='year',
    direction='backward'
)

add this to code and run, any error let me know..

See rule #5 in this link: Sort both dataframes according to the column listed for the ‘on’ parameter.

This is also a rather hidden rule given its necessity, especially if you data was sorted initially. Regardless, just like watching out for nonetypes before running a merge ensure you sort both data frames according to the on column and all should be well. This is actually a rule that merge_asof shares with merge_ordered() so ensure you sort before that kind of merge also. To find out more about merge_ordered() go here.

I had sorted the frames on 'identifier', and 'year'

本文标签：

版权声明：本文标题：pandas - Nearest matching for groups, when merge_asof fails because of warning that left frame is not properly sorted - Stack Ov 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736593329a1945110.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

发表评论

全部评论 0

暂无评论

编程频道|软件玩家 - 软件改变生活！

pandas - Nearest matching for groups, when merge_asof fails because of warning that left frame is not properly sorted - Stack Ov

2 Answers 2

更多相关文章

winui 3 - Can one use MKL with C++WinRT? - Stack Overflow

directory - Is there any way to get Msys2 MingW to use my default Python installation instead of its own? - Stack Overflow

database - Scheduled Postman collection run with changing CSV file - Stack Overflow

Site Editor: Pattern previews too small?

mysql - How to generate PDF dynamically based on values from the database in WordPress

c++ - Simple MS-MPI program fails with mixed AMDIntel CPUs - Stack Overflow

Drag and Drop project in assembly 8086 (move a square with the mouse) - Stack Overflow

sql server - Parsing JSON nodes in the same string - Stack Overflow

qt - add a subpanel to the main panel to store qpushbuttons - Stack Overflow

vb.net - The ASP.NET function returns only zero for some cases - Stack Overflow

security - nonce_user_logged_out to assign guests unique nonces breaks ajax calls

python - Disable PySpark to print info when running - Stack Overflow

functions - How can I restrict comments on Wordpress, so only the POST AUTHOR and the user who commented can see them?

python - FastAPI dynamic advanced dependencies- Stack Overflow

php - How can i force replace a external js scripts language on the site itself?

oembed - Disable the buildin embed only in the editor

google sheets - FLATTEN output from ARRAYFORMULA with TRANSPOSESPLITVLOOKUP - Stack Overflow

posts - Displaying Page Title on index.php

plugins - Executing ACF field as a shortcode

google pay - How to mock GooglePay in Cypress - Stack Overflow

发表评论

推荐文章

php - WooCommerce Mini-Cart Problem

Add Custom CSS to Woocommerce Product Page in a specified category

html - Using umlaut&#39;s in CSS font-family name? - Stack Overflow

How to import facebook comments to Wordpress database?

next.js - NextJS &amp; Twilio Conversations SDK - Struggling with client vs server components - need state but cant pass SDK

热门文章

php - Split titles by the &quot; - &quot; in Wordpress

sql server - Table-valued parameter performing poorly in EF Core vs T-SQL directly - Stack Overflow

c++ - operator&lt;&lt; overload not selected for rvalue std::ostringstream and std::unique_ptr - Stack Overflow

swift - Cannot show RPPreviewViewController with .sheet modifier in SwiftUI. Why? - Stack Overflow

Drawbacks of making the default post-type: post hierarchical

node.js - Observable Plot - Stack Overflow

Basic Azure function logging isn&#39;t working - Stack Overflow

c# - Bind a radio button group to model in .Net - Stack Overflow

multisite - SuperAdmin Access to a Subsite fails

rust - error[E0463]: can&#39;t find crate for `core` - Stack Overflow

最新文章

Java入门级教学（IDEA的下载与安装与JDK的环境配置）

华硕笔记本电脑用U盘重装windows系统

物理网卡MAC修改器v3.0 - 真实网卡硬件MAC地址修改，重装系统不变！

如何一键安装win7系统(一键安装win7系统步骤)

Windows 11最稳定版本详解

php - Filtering according to the locale &#39;de_DE&#39;

Proxy in rust warp for static file serving - Stack Overflow

the content - modify the_content() for page to display attachments with own style

google pay - How to mock GooglePay in Cypress - Stack Overflow

photopea - Is there an array of variable that functions are assigned to when evaluating scripts? In - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

html - Using umlaut's in CSS font-family name? - Stack Overflow

next.js - NextJS & Twilio Conversations SDK - Struggling with client vs server components - need state but cant pass SDK

php - Split titles by the " - " in Wordpress

c++ - operator<< overload not selected for rvalue std::ostringstream and std::unique_ptr - Stack Overflow

Basic Azure function logging isn't working - Stack Overflow

rust - error[E0463]: can't find crate for `core` - Stack Overflow

php - Filtering according to the locale 'de_DE'