admin管理员组文章数量:1123708
I have a large table (original_df
) which includes 10+ columns and 1000+ rows. One of those columns (street_column
) lists street intersections. Another column (coordinates_column
) lists a combined latitude and longitude coordinate. The street_column
row values can have many duplicates.
My issue is that the corresponding coordinates_column
currently have many different values for the same street_column
value.
My goal is to remove variation in the coordinates_column
values to ensure each unique street_column
value always correlates with the same coordinates_column
value. I have created a second table (key_df
) that only includes 2 columns: one for unique street_column
values and the other for its unique coordinates_column
values.
I would like to replace all coordinates_column
values in the original_df
table with all coordinates_column
values in the key_df
table based off of the matching street_column
value.
I this is likely super basic, but I am brand new to R and don't know where to start. And I have had no luck finding an answer to this scenario. I am currently using Rstudio (posit).
original_df table
ride_id | street_column | coordinates_column |
001 | Ash St & 1st Ave | -100.123,98.123 |
002 | Ash St & 1st Ave | -100.100,98.123 |
003 | Brooke St & Rose Rd | 90.456,91.456 |
004 | Brooke St & Rose Rd | 90.400,91.987 |
005 | 9th Ave & Center St | 20.567,-100.654 |
006 | 9th Ave & Center St | 21.123,-100.654 |
007 | 9th Ave & Center St | 20.567,-101.100 |
key_df table
street_column | coordinates_column |
Ash St & 1st Ave | -100.123,98.123 |
Brooke St & Rose Rd | 90.456,91.456 |
9th Ave & Center St | 20.567,-100.654 |
desired change to original_df table
ride_id | street_column | coordinates_column |
001 | Ash St & 1st Ave | -100.123,98.123 |
002 | Ash St & 1st Ave | -100.123,98.123 |
003 | Brooke St & Rose Rd | 90.456,91.456 |
004 | Brooke St & Rose Rd | 90.456,91.456 |
005 | 9th Ave & Center St | 20.567,-100.654 |
006 | 9th Ave & Center St | 20.567,-100.654 |
007 | 9th Ave & Center St | 20.567,-100.654 |
original_df <-
structure(list(
ride_id = c("1", "2", "3", "4", "5", "6", "7"),
street_column = c("Ash St & 1st Ave", "Ash St & 1st Ave",
"Brooke St & Rose Rd", "Brooke St & Rose Rd",
"9th Ave & Center St", "9th Ave & Center St", "9th Ave & Center St"),
coordinates_column = c("-100.123,98.123", "-100.100,98.123", "90.456,91.456",
"90.400,91.987", "20.567,-100.654", "21.123,-100.654", "20.567,-101.100")),
row.names = c(NA, -7L), class = "data.frame")
key_df <-
structure(list(
street_column = c("Ash St & 1st Ave", "Brooke St & Rose Rd", "9th Ave & Center St"),
coordinates_column = c("-100.123,98.123", "90.456,91.456", "20.567,-100.654")),
row.names = c(NA, -3L), class = "data.frame")
desired_df <-
structure(list(
ride_id = c("1", "2", "3", "4", "5", "6", "7"),
street_column = c("Ash St & 1st Ave", "Ash St & 1st Ave",
"Brooke St & Rose Rd", "Brooke St & Rose Rd",
"9th Ave & Center St", "9th Ave & Center St", "9th Ave & Center St"),
coordinates_column = c("-100.123,98.123", "-100.123,98.123", "90.456,91.456",
"90.456,91.456", "20.567,-100.654", "20.567,-100.654", "20.567,-100.654")),
row.names = c(NA, -7L), class = "data.frame")
I have a large table (original_df
) which includes 10+ columns and 1000+ rows. One of those columns (street_column
) lists street intersections. Another column (coordinates_column
) lists a combined latitude and longitude coordinate. The street_column
row values can have many duplicates.
My issue is that the corresponding coordinates_column
currently have many different values for the same street_column
value.
My goal is to remove variation in the coordinates_column
values to ensure each unique street_column
value always correlates with the same coordinates_column
value. I have created a second table (key_df
) that only includes 2 columns: one for unique street_column
values and the other for its unique coordinates_column
values.
I would like to replace all coordinates_column
values in the original_df
table with all coordinates_column
values in the key_df
table based off of the matching street_column
value.
I this is likely super basic, but I am brand new to R and don't know where to start. And I have had no luck finding an answer to this scenario. I am currently using Rstudio (posit).
original_df table
ride_id | street_column | coordinates_column |
001 | Ash St & 1st Ave | -100.123,98.123 |
002 | Ash St & 1st Ave | -100.100,98.123 |
003 | Brooke St & Rose Rd | 90.456,91.456 |
004 | Brooke St & Rose Rd | 90.400,91.987 |
005 | 9th Ave & Center St | 20.567,-100.654 |
006 | 9th Ave & Center St | 21.123,-100.654 |
007 | 9th Ave & Center St | 20.567,-101.100 |
key_df table
street_column | coordinates_column |
Ash St & 1st Ave | -100.123,98.123 |
Brooke St & Rose Rd | 90.456,91.456 |
9th Ave & Center St | 20.567,-100.654 |
desired change to original_df table
ride_id | street_column | coordinates_column |
001 | Ash St & 1st Ave | -100.123,98.123 |
002 | Ash St & 1st Ave | -100.123,98.123 |
003 | Brooke St & Rose Rd | 90.456,91.456 |
004 | Brooke St & Rose Rd | 90.456,91.456 |
005 | 9th Ave & Center St | 20.567,-100.654 |
006 | 9th Ave & Center St | 20.567,-100.654 |
007 | 9th Ave & Center St | 20.567,-100.654 |
original_df <-
structure(list(
ride_id = c("1", "2", "3", "4", "5", "6", "7"),
street_column = c("Ash St & 1st Ave", "Ash St & 1st Ave",
"Brooke St & Rose Rd", "Brooke St & Rose Rd",
"9th Ave & Center St", "9th Ave & Center St", "9th Ave & Center St"),
coordinates_column = c("-100.123,98.123", "-100.100,98.123", "90.456,91.456",
"90.400,91.987", "20.567,-100.654", "21.123,-100.654", "20.567,-101.100")),
row.names = c(NA, -7L), class = "data.frame")
key_df <-
structure(list(
street_column = c("Ash St & 1st Ave", "Brooke St & Rose Rd", "9th Ave & Center St"),
coordinates_column = c("-100.123,98.123", "90.456,91.456", "20.567,-100.654")),
row.names = c(NA, -3L), class = "data.frame")
desired_df <-
structure(list(
ride_id = c("1", "2", "3", "4", "5", "6", "7"),
street_column = c("Ash St & 1st Ave", "Ash St & 1st Ave",
"Brooke St & Rose Rd", "Brooke St & Rose Rd",
"9th Ave & Center St", "9th Ave & Center St", "9th Ave & Center St"),
coordinates_column = c("-100.123,98.123", "-100.123,98.123", "90.456,91.456",
"90.456,91.456", "20.567,-100.654", "20.567,-100.654", "20.567,-100.654")),
row.names = c(NA, -7L), class = "data.frame")
Share
Improve this question
edited yesterday
Rui Barradas
76.3k8 gold badges39 silver badges72 bronze badges
asked yesterday
Tyler WahlquistTyler Wahlquist
111 silver badge1 bronze badge
New contributor
Tyler Wahlquist is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2 Answers
Reset to default 3match()
can be used to find the row number in key_df
which matches by street_column
for each row in original_df
:
original_df$coordinates_column <- key_df$coordinates_column[
match(original_df$street_column, key_df$street_column)
]
identical(original_df, desired_df)
#> [1] TRUE
Or, use dplyr to join original_df
with key_df
, then drop the original coordinates_column
:
library(dplyr)
original_df <- original_df |>
left_join(key_df, by = join_by(street_column), suffix = c(".x", "")) |>
select(-coordinates_column.x)
identical(original_df, desired_df)
#> [1] TRUE
Here is a way.
First get where in original_df
the columns of interest "street_column"
and "coordinates_column"
are. Then join the original without the coordinates and the key data sets.
Then, put the columns in their original order.
If you want the rows to be in the original order too, create an index to their original order and use it to sort the result.
i <- grep("coordinates_column", names(original_df))
j <- grep("street_column", names(original_df))
cols_order <- c(j, (1:ncol(original_df))[-j])
result <- merge(original_df[-i], key_df)[cols_order]
o <- order(original_df$ride_id, original_df$street_column)
result[order(o),]
#> ride_id street_column coordinates_column
#> 4 1 Ash St & 1st Ave -100.123,98.123
#> 5 2 Ash St & 1st Ave -100.123,98.123
#> 6 3 Brooke St & Rose Rd 90.456,91.456
#> 7 4 Brooke St & Rose Rd 90.456,91.456
#> 1 5 9th Ave & Center St 20.567,-100.654
#> 2 6 9th Ave & Center St 20.567,-100.654
#> 3 7 9th Ave & Center St 20.567,-100.654
Created on 2025-01-10 with reprex v2.1.1
本文标签:
版权声明:本文标题:join - How to replace multiple column values based on another table's column where each table has a common column value 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736590554a1945069.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论