admin管理员组文章数量:1123011
I have a list of phone numbers and another column with a 1 or 0, some numbers are duplicates. I can find the list of duplicates using a df to read the csv
df = pd.read_csv("Example_File")
duplicates = df[df.duplicated()]
I want to take that list of duplcates, and see if column 2 with the 0 or 1 match for the duplicated number.
Example:
Col 1 | Col2 |
---|---|
555-555-5555 | 1 |
555-555-5555 | 1 |
555-123-4567 | 0 |
555-123-4567 | 1 |
777-555-5555 | 0 |
I have a list of phone numbers and another column with a 1 or 0, some numbers are duplicates. I can find the list of duplicates using a df to read the csv
df = pd.read_csv("Example_File")
duplicates = df[df.duplicated()]
I want to take that list of duplcates, and see if column 2 with the 0 or 1 match for the duplicated number.
Example:
Col 1 | Col2 |
---|---|
555-555-5555 | 1 |
555-555-5555 | 1 |
555-123-4567 | 0 |
555-123-4567 | 1 |
777-555-5555 | 0 |
I would like to get a list of the items where col 1 has a duplicate and column 2 does not match. Example: 555-123-4567 has 0 and 1 in column 2.
I have tried comparing the 2 lists as a df and got as far as exporting the duplicates in column 1 to a new df along with the corresponding column 2 but cant resolve to to find the final list where I have duplicated phone numbers ob column 2 does not match for each phone number.
Share Improve this question edited 24 mins ago Barmar 779k56 gold badges542 silver badges656 bronze badges asked 30 mins ago BcpeagleBcpeagle 1 New contributor Bcpeagle is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 3 |1 Answer
Reset to default 0You can use groupby
and unique
# read csv
df = pd.read_csv("Example_File")
# Identify duplicates where col2 values differ for the same col1
mismatched = df.groupby('col1').filter(lambda x: len(x['col2'].unique()) > 1)
# Get the unique col1 values with mismatched col2 values
result = mismatched['col1'].unique()
本文标签: pandasFinding Duplicate Values in 2 Columns in PythonStack Overflow
版权声明:本文标题:pandas - Finding Duplicate Values in 2 Columns in Python - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736539386a1944366.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
df.duplicated
take a look at thesubset
andkeep
parameters. – BeRT2me Commented 19 mins ago