

I have my data set up as follows:

my_data <- data.frame(
  learner_code = 1:8,
  lsk = c(0, 10, 20, 30, 50, 15, 25, 40)

I want to run the following code to first create categories within the lsk variable based on the learner scores, and then create a table of the counts and proportion of learners in each category.

Here is the current code:

lsk_cat1 <- my_data %>%
  mutate(lsk = case_when (lsk == 0 ~ "0",
                      lsk >0 & lsk < 20 ~ "1-19",
                      lsk > 19 & lsk < 40 ~ "20-39",
                      lsk>= 40 ~ "40+" )) %>%
  group_by(lsk) %>% summarise(n = length(lsk),
                          proportion = round(length(lsk)/8*100, 1))

This gives me the following output:

n proportion
8        100

When it should give something like this:

 lsk     n proportion
1 0         1      12.5
2 1-19      3      37.5
3 20-39     3      37.5
4 40+       1      12.5

Worked perfectly for another dataset now it's being difficult, would appreciate the help.

I have my data set up as follows:

my_data <- data.frame(
  learner_code = 1:8,
  lsk = c(0, 10, 20, 30, 50, 15, 25, 40)

I want to run the following code to first create categories within the lsk variable based on the learner scores, and then create a table of the counts and proportion of learners in each category.

Here is the current code:

lsk_cat1 <- my_data %>%
  mutate(lsk = case_when (lsk == 0 ~ "0",
                      lsk >0 & lsk < 20 ~ "1-19",
                      lsk > 19 & lsk < 40 ~ "20-39",
                      lsk>= 40 ~ "40+" )) %>%
  group_by(lsk) %>% summarise(n = length(lsk),
                          proportion = round(length(lsk)/8*100, 1))

This gives me the following output:

n proportion
8        100

When it should give something like this:

 lsk     n proportion
1 0         1      12.5
2 1-19      3      37.5
3 20-39     3      37.5
4 40+       1      12.5

Worked perfectly for another dataset now it's being difficult, would appreciate the help.

Share Improve this question asked Nov 22, 2024 at 15:32 Paige CoxPaige Cox 556 bronze badges 0
Add a comment  | 

1 Answer 1

Reset to default 1

We should use cut() from base.

  1. Base R
my_data$lsk |>
  cut(breaks=c(0, 1, 20, 40, Inf), 
      labels=c("0", "1-19", "20-39", "40+"), 
      right=FALSE) |>
  table() |> |>
  transform(i = sprintf("%.1f%%", Freq / sum(Freq) * 100)) |>
  setNames(c("lsk", "n", "proportion"))
    lsk n proportion
1     0 1      12.5%
2  1-19 2      25.0%
3 20-39 3      37.5%
4   40+ 2      25.0%
  1. dplyr
my_data |>
  group_by(lsk = cut(lsk, 
                 breaks=c(0, 1, 20, 40, Inf), 
                 labels=c("0", "1-19", "20-39", "40+"), 
                 right=FALSE)) |>
  summarise(n = length(lsk)) |>
  mutate(proportion = scales::percent(prop.table(n), .1))
# A tibble: 4 × 3
  lsk       n proportion
  <fct> <int> <chr>     
1 0         1 12.5%     
2 1-19      2 25.0%     
3 20-39     3 37.5%     
4 40+       2 25.0%  



my_data = data.frame(learner_code = 1:8, lsk = c(0, 10, 20, 30, 50, 15, 25, 40))


my_data = tibble::tibble(learner_code = 1:8, lsk = c(0, 10, 20, 30, 50, 15, 25, 40))
