pandas - Finding Duplicate Values in 2 Columns in Python - Stack Overflow

IT技术

更新时间：2025-01-115

admin管理员组
文章数量:1123011

I have a list of phone numbers and another column with a 1 or 0, some numbers are duplicates. I can find the list of duplicates using a df to read the csv

df = pd.read_csv("Example_File")
duplicates = df[df.duplicated()]

I want to take that list of duplcates, and see if column 2 with the 0 or 1 match for the duplicated number.

Example:

Col 1	Col2
555-555-5555	1
555-555-5555	1
555-123-4567	0
555-123-4567	1
777-555-5555	0

I have a list of phone numbers and another column with a 1 or 0, some numbers are duplicates. I can find the list of duplicates using a df to read the csv

df = pd.read_csv("Example_File")
duplicates = df[df.duplicated()]

I want to take that list of duplcates, and see if column 2 with the 0 or 1 match for the duplicated number.

Example:

Col 1	Col2
555-555-5555	1
555-555-5555	1
555-123-4567	0
555-123-4567	1
777-555-5555	0

I would like to get a list of the items where col 1 has a duplicate and column 2 does not match. Example: 555-123-4567 has 0 and 1 in column 2.

I have tried comparing the 2 lists as a df and got as far as exporting the duplicates in column 1 to a new df along with the corresponding column 2 but cant resolve to to find the final list where I have duplicated phone numbers ob column 2 does not match for each phone number.

Share Improve this question edited 24 mins ago Barmar 779k56 gold badges542 silver badges656 bronze badges asked 30 mins ago Bcpeagle 1 New contributor Bcpeagle is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

get the unique count of Col2 grouped by Col1 and then return the rows where the count>1. – Barmar Commented 21 mins ago
For df.duplicated take a look at the subset and keep parameters. – BeRT2me Commented 19 mins ago
Thank you that is a good idea. I am looking to group by col 1 with the unique count. I will aslo check the subset and keep parameters. – Bcpeagle Commented 9 mins ago

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

You can use groupby and unique

# read csv
df = pd.read_csv("Example_File")

# Identify duplicates where col2 values differ for the same col1
mismatched = df.groupby('col1').filter(lambda x: len(x['col2'].unique()) > 1)

# Get the unique col1 values with mismatched col2 values
result = mismatched['col1'].unique()

本文标签： pandasFinding Duplicate Values in 2 Columns in PythonStack Overflow

版权声明：本文标题：pandas - Finding Duplicate Values in 2 Columns in Python - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736539386a1944366.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

pandas - Finding Duplicate Values in 2 Columns in Python - Stack Overflow

1 Answer 1

更多相关文章