admin管理员组

文章数量:1123708

I have a large table (original_df) which includes 10+ columns and 1000+ rows. One of those columns (street_column) lists street intersections. Another column (coordinates_column) lists a combined latitude and longitude coordinate. The street_column row values can have many duplicates.

My issue is that the corresponding coordinates_column currently have many different values for the same street_column value.

My goal is to remove variation in the coordinates_column values to ensure each unique street_column value always correlates with the same coordinates_column value. I have created a second table (key_df) that only includes 2 columns: one for unique street_column values and the other for its unique coordinates_column values.

I would like to replace all coordinates_column values in the original_df table with all coordinates_column values in the key_df table based off of the matching street_column value.

I this is likely super basic, but I am brand new to R and don't know where to start. And I have had no luck finding an answer to this scenario. I am currently using Rstudio (posit).

original_df table

ride_id | street_column             | coordinates_column  |
001     | Ash St & 1st Ave          | -100.123,98.123         |
002     | Ash St & 1st Ave          | -100.100,98.123         |
003     | Brooke St & Rose Rd    | 90.456,91.456         | 
004     | Brooke St & Rose Rd    | 90.400,91.987         |
005     | 9th Ave & Center St     | 20.567,-100.654       |
006     | 9th Ave & Center St     | 21.123,-100.654       |
007     | 9th Ave & Center St     | 20.567,-101.100       |

key_df table

street_column        | coordinates_column  |
Ash St & 1st Ave     | -100.123,98.123       |
Brooke St & Rose Rd  | 90.456,91.456       |
9th Ave & Center St  | 20.567,-100.654     |

desired change to original_df table

ride_id | street_column        | coordinates_column  |
001     | Ash St & 1st Ave     | -100.123,98.123         |
002     | Ash St & 1st Ave     | -100.123,98.123         |
003     | Brooke St & Rose Rd  | 90.456,91.456         |
004     | Brooke St & Rose Rd  | 90.456,91.456         |
005     | 9th Ave & Center St  | 20.567,-100.654       |
006     | 9th Ave & Center St  | 20.567,-100.654       |
007     | 9th Ave & Center St  | 20.567,-100.654       |

original_df <- 
  structure(list(
    ride_id = c("1", "2", "3", "4", "5", "6", "7"), 
    street_column = c("Ash St & 1st Ave", "Ash St & 1st Ave", 
                      "Brooke St & Rose Rd", "Brooke St & Rose Rd", 
                      "9th Ave & Center St", "9th Ave & Center St", "9th Ave & Center St"), 
    coordinates_column = c("-100.123,98.123", "-100.100,98.123", "90.456,91.456", 
                           "90.400,91.987", "20.567,-100.654", "21.123,-100.654", "20.567,-101.100")), 
    row.names = c(NA, -7L), class = "data.frame")

key_df <-
  structure(list(
    street_column = c("Ash St & 1st Ave", "Brooke St & Rose Rd", "9th Ave & Center St"), 
    coordinates_column = c("-100.123,98.123", "90.456,91.456", "20.567,-100.654")), 
    row.names = c(NA, -3L), class = "data.frame")

desired_df <-
  structure(list(
    ride_id = c("1", "2", "3", "4", "5", "6", "7"), 
    street_column = c("Ash St & 1st Ave", "Ash St & 1st Ave", 
                      "Brooke St & Rose Rd", "Brooke St & Rose Rd", 
                      "9th Ave & Center St", "9th Ave & Center St", "9th Ave & Center St"), 
    coordinates_column = c("-100.123,98.123", "-100.123,98.123", "90.456,91.456", 
                           "90.456,91.456", "20.567,-100.654", "20.567,-100.654", "20.567,-100.654")), 
    row.names = c(NA, -7L), class = "data.frame")

I have a large table (original_df) which includes 10+ columns and 1000+ rows. One of those columns (street_column) lists street intersections. Another column (coordinates_column) lists a combined latitude and longitude coordinate. The street_column row values can have many duplicates.

My issue is that the corresponding coordinates_column currently have many different values for the same street_column value.

My goal is to remove variation in the coordinates_column values to ensure each unique street_column value always correlates with the same coordinates_column value. I have created a second table (key_df) that only includes 2 columns: one for unique street_column values and the other for its unique coordinates_column values.

I would like to replace all coordinates_column values in the original_df table with all coordinates_column values in the key_df table based off of the matching street_column value.

I this is likely super basic, but I am brand new to R and don't know where to start. And I have had no luck finding an answer to this scenario. I am currently using Rstudio (posit).

original_df table

ride_id | street_column             | coordinates_column  |
001     | Ash St & 1st Ave          | -100.123,98.123         |
002     | Ash St & 1st Ave          | -100.100,98.123         |
003     | Brooke St & Rose Rd    | 90.456,91.456         | 
004     | Brooke St & Rose Rd    | 90.400,91.987         |
005     | 9th Ave & Center St     | 20.567,-100.654       |
006     | 9th Ave & Center St     | 21.123,-100.654       |
007     | 9th Ave & Center St     | 20.567,-101.100       |

key_df table

street_column        | coordinates_column  |
Ash St & 1st Ave     | -100.123,98.123       |
Brooke St & Rose Rd  | 90.456,91.456       |
9th Ave & Center St  | 20.567,-100.654     |

desired change to original_df table

ride_id | street_column        | coordinates_column  |
001     | Ash St & 1st Ave     | -100.123,98.123         |
002     | Ash St & 1st Ave     | -100.123,98.123         |
003     | Brooke St & Rose Rd  | 90.456,91.456         |
004     | Brooke St & Rose Rd  | 90.456,91.456         |
005     | 9th Ave & Center St  | 20.567,-100.654       |
006     | 9th Ave & Center St  | 20.567,-100.654       |
007     | 9th Ave & Center St  | 20.567,-100.654       |

original_df <- 
  structure(list(
    ride_id = c("1", "2", "3", "4", "5", "6", "7"), 
    street_column = c("Ash St & 1st Ave", "Ash St & 1st Ave", 
                      "Brooke St & Rose Rd", "Brooke St & Rose Rd", 
                      "9th Ave & Center St", "9th Ave & Center St", "9th Ave & Center St"), 
    coordinates_column = c("-100.123,98.123", "-100.100,98.123", "90.456,91.456", 
                           "90.400,91.987", "20.567,-100.654", "21.123,-100.654", "20.567,-101.100")), 
    row.names = c(NA, -7L), class = "data.frame")

key_df <-
  structure(list(
    street_column = c("Ash St & 1st Ave", "Brooke St & Rose Rd", "9th Ave & Center St"), 
    coordinates_column = c("-100.123,98.123", "90.456,91.456", "20.567,-100.654")), 
    row.names = c(NA, -3L), class = "data.frame")

desired_df <-
  structure(list(
    ride_id = c("1", "2", "3", "4", "5", "6", "7"), 
    street_column = c("Ash St & 1st Ave", "Ash St & 1st Ave", 
                      "Brooke St & Rose Rd", "Brooke St & Rose Rd", 
                      "9th Ave & Center St", "9th Ave & Center St", "9th Ave & Center St"), 
    coordinates_column = c("-100.123,98.123", "-100.123,98.123", "90.456,91.456", 
                           "90.456,91.456", "20.567,-100.654", "20.567,-100.654", "20.567,-100.654")), 
    row.names = c(NA, -7L), class = "data.frame")
Share Improve this question edited yesterday Rui Barradas 76.3k8 gold badges39 silver badges72 bronze badges asked yesterday Tyler WahlquistTyler Wahlquist 111 silver badge1 bronze badge New contributor Tyler Wahlquist is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
Add a comment  | 

2 Answers 2

Reset to default 3

match() can be used to find the row number in key_df which matches by street_column for each row in original_df:

original_df$coordinates_column <- key_df$coordinates_column[
  match(original_df$street_column, key_df$street_column)
]

identical(original_df, desired_df)
#> [1] TRUE

Or, use dplyr to join original_df with key_df, then drop the original coordinates_column:

library(dplyr)

original_df <- original_df |> 
  left_join(key_df, by = join_by(street_column), suffix = c(".x", "")) |>
  select(-coordinates_column.x)

identical(original_df, desired_df)
#> [1] TRUE

Here is a way.
First get where in original_df the columns of interest "street_column" and "coordinates_column" are. Then join the original without the coordinates and the key data sets.
Then, put the columns in their original order.

If you want the rows to be in the original order too, create an index to their original order and use it to sort the result.

i <- grep("coordinates_column", names(original_df))
j <- grep("street_column", names(original_df))
cols_order <- c(j, (1:ncol(original_df))[-j])
result <- merge(original_df[-i], key_df)[cols_order]

o <- order(original_df$ride_id, original_df$street_column)
result[order(o),]
#>   ride_id       street_column coordinates_column
#> 4       1    Ash St & 1st Ave    -100.123,98.123
#> 5       2    Ash St & 1st Ave    -100.123,98.123
#> 6       3 Brooke St & Rose Rd      90.456,91.456
#> 7       4 Brooke St & Rose Rd      90.456,91.456
#> 1       5 9th Ave & Center St    20.567,-100.654
#> 2       6 9th Ave & Center St    20.567,-100.654
#> 3       7 9th Ave & Center St    20.567,-100.654

Created on 2025-01-10 with reprex v2.1.1

本文标签: