admin管理员组

文章数量:1377511

I have data that looks like this:

expected_data

resp_migration_status kmcluster percentage expected
1 Non-migrant 1 21.9 30.5
2 Non-migrant 2 30.1 27.4
3 Non-migrant 3 24.7 19.9
4 Non-migrant 4 23.3 22.3
5 Migrant 1 41.9 30.5
6 Migrant 2 22.6 27.4
7 Migrant 3 19.4 19.9
8 Migrant 4 16.1 22.3
9 Displaced 1 36.9 30.5
10 Displaced 2 26.2 27.4
11 Displaced 3 11.9 19.9
12 Displaced 4 25 22.3

I have data that looks like this:

expected_data

resp_migration_status kmcluster percentage expected
1 Non-migrant 1 21.9 30.5
2 Non-migrant 2 30.1 27.4
3 Non-migrant 3 24.7 19.9
4 Non-migrant 4 23.3 22.3
5 Migrant 1 41.9 30.5
6 Migrant 2 22.6 27.4
7 Migrant 3 19.4 19.9
8 Migrant 4 16.1 22.3
9 Displaced 1 36.9 30.5
10 Displaced 2 26.2 27.4
11 Displaced 3 11.9 19.9
12 Displaced 4 25 22.3

I'd like to construct a bar graph which shows percentage by kmcluster and over resp_migration_status. I've done this successfully using this code:

ggplot(expected_data, aes(x = resp_migration_status, y = percentage, fill = kmcluster)) +
  geom_bar(stat = "identity", position = "dodge") +  # Use stat = "identity" for pre-computed values
  labs(
    title = "Percentage distribution of network cluster by migration status",
    x = "Migration Status",
    y = "Percentage",
    fill = "Cluster"
  ) +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +  # Format y-axis as percentages
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Overlayed on this bar graph, I'd like to do another graph with black outlines for the bars, which shows the expected percentage by kmcluster and over resp_migration_status. Essentially, it's a graphical representation of a chi-square test: understanding what the distribution of cluster would be by migration type if it was perfectly random, compared to the 'actual' distribution where some migration types are disproportionately in one cluster.

How do I overlay a very basic (black outlined) bar graph on the original graph to represent this? I have this code:

ggplot(expected_data, aes(x = resp_migration_status, y = expected, fill = kmcluster)) +
  geom_bar(stat = "identity", position = "dodge", color = "black", fill = NA) +  # Use stat = "identity" for pre-computed values, bars with black outlines
  labs(
    title = "Expected percentage distribution of network cluster by migration status",
    x = "Migration Status",
    y = "Percentage"
  ) +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +  # Format y-axis as percentages
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

But adding fill = NA inside geom_bar overrides the fill=cluster in the aes, such that it no longer divides the data across cluster types and it makes it into some strange stacked bar (see image).

So the first question is:

  1. How do I divide the data by migration type and cluster, without coloring in each bar and instead just outlining them in black?

Secondly:

  1. How do I overlay this bar graph on top of the original one?
Share Improve this question edited Mar 19 at 14:56 MrFlick 207k19 gold badges295 silver badges318 bronze badges Recognized by R Language Collective asked Mar 19 at 14:51 KristenKristen 193 bronze badges 2
  • Apologies -- the data table looked perfect in the preview but didn't post as expected :( – Kristen Commented Mar 19 at 14:54
  • 2 Rather than including data in a table, it's better to share it as a dput() so we can copy/paste it directly into R for testing. See how to create a reproducible example. – MrFlick Commented Mar 19 at 14:57
Add a comment  | 

1 Answer 1

Reset to default 1

To add your second bars on top of the first you have to explicitly map on the group aes to still get a dodged bar chart.

library(ggplot2)

ggplot(expected_data, aes(
  x = resp_migration_status,
  y = percentage, fill = factor(kmcluster)
)) +
  geom_col(position = "dodge") +
  geom_col(aes(y = expected, group = factor(kmcluster)),
    color = "black", fill = NA, position = "dodge"
  ) +
  labs(
    title = "Percentage distribution of network cluster by migration status",
    x = "Migration Status",
    y = "Percentage",
    fill = "Cluster"
  ) +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) + # Format y-axis as percentages
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

DATA

expected_data <- data.frame(
  resp_migration_status = c(
    "Non-migrant", "Non-migrant", "Non-migrant", "Non-migrant",
    "Migrant", "Migrant", "Migrant", "Migrant",
    "Displaced", "Displaced", "Displaced", "Displaced"
  ),
  kmcluster = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L),
  percentage = c(
    21.9, 30.1, 24.7,
    23.3, 41.9, 22.6, 19.4, 16.1, 36.9, 26.2, 11.9, 25
  ),
  expected = c(
    30.5, 27.4, 19.9,
    22.3, 30.5, 27.4, 19.9, 22.3, 30.5, 27.4, 19.9, 22.3
  )
)

本文标签: rHow can I layer an outlined bar graph on top of a colored bar graph in ggplotStack Overflow