admin管理员组

文章数量:1123011

I have a list of phone numbers and another column with a 1 or 0, some numbers are duplicates. I can find the list of duplicates using a df to read the csv

df = pd.read_csv("Example_File")
duplicates = df[df.duplicated()]

I want to take that list of duplcates, and see if column 2 with the 0 or 1 match for the duplicated number.

Example:

Col 1 Col2
555-555-5555 1
555-555-5555 1
555-123-4567 0
555-123-4567 1
777-555-5555 0

I have a list of phone numbers and another column with a 1 or 0, some numbers are duplicates. I can find the list of duplicates using a df to read the csv

df = pd.read_csv("Example_File")
duplicates = df[df.duplicated()]

I want to take that list of duplcates, and see if column 2 with the 0 or 1 match for the duplicated number.

Example:

Col 1 Col2
555-555-5555 1
555-555-5555 1
555-123-4567 0
555-123-4567 1
777-555-5555 0

I would like to get a list of the items where col 1 has a duplicate and column 2 does not match. Example: 555-123-4567 has 0 and 1 in column 2.

I have tried comparing the 2 lists as a df and got as far as exporting the duplicates in column 1 to a new df along with the corresponding column 2 but cant resolve to to find the final list where I have duplicated phone numbers ob column 2 does not match for each phone number.

Share Improve this question edited 24 mins ago Barmar 779k56 gold badges542 silver badges656 bronze badges asked 30 mins ago BcpeagleBcpeagle 1 New contributor Bcpeagle is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 3
  • get the unique count of Col2 grouped by Col1 and then return the rows where the count>1. – Barmar Commented 21 mins ago
  • For df.duplicated take a look at the subset and keep parameters. – BeRT2me Commented 19 mins ago
  • Thank you that is a good idea. I am looking to group by col 1 with the unique count. I will aslo check the subset and keep parameters. – Bcpeagle Commented 9 mins ago
Add a comment  | 

1 Answer 1

Reset to default 0

You can use groupby and unique

# read csv
df = pd.read_csv("Example_File")

# Identify duplicates where col2 values differ for the same col1
mismatched = df.groupby('col1').filter(lambda x: len(x['col2'].unique()) > 1)

# Get the unique col1 values with mismatched col2 values
result = mismatched['col1'].unique()

本文标签: pandasFinding Duplicate Values in 2 Columns in PythonStack Overflow