admin管理员组

文章数量:1415697

I have a table like this (below).

Gene_name Sample_name Gene_fraction
IGHV1-11 sample_1 0.00057491
IGHV1-12 sample_2 0.0044843
IGHV1-15 sample_3 0.01253306
IGHV1-18 sample_4 0.00942854
IGHV1-19 sample_5 0.01747729
IGHV1-2 sample_6 0.00034495
IGHV1-11 sample_7 0.00103484
IGHV1-13 sample_8 0.01517765
IGHV1-16 sample_9 0.00758882
IGHV1-18 sample_10 0.00827872

I have a table like this (below).

Gene_name Sample_name Gene_fraction
IGHV1-11 sample_1 0.00057491
IGHV1-12 sample_2 0.0044843
IGHV1-15 sample_3 0.01253306
IGHV1-18 sample_4 0.00942854
IGHV1-19 sample_5 0.01747729
IGHV1-2 sample_6 0.00034495
IGHV1-11 sample_7 0.00103484
IGHV1-13 sample_8 0.01517765
IGHV1-16 sample_9 0.00758882
IGHV1-18 sample_10 0.00827872

How to transform the above table to this table like this (below) in R?

Sample_name IGHV1-11 IGHV1-12 IGHV1-15 IGHV1-18 IGHV1-19 IGHV1-2 IGHV1-13 IGHV1-16
sample_1WT 0.00057491 0.0044843 0 0 0 0 0 0
sample_2WT 0 0.0044843 0 0 0 0 0 0
sample_3WT 0 0 0.01253306 0 0 0 0 0
sample_4MT 0 0 0 0.00942854 0 0 0 0
sample_5WT 0 0 0 0 0.01747729 0 0 0
sample_6WT 0 0 0 0 0 0.00034495 0 0
sample_7MT 0.00103484 0 0 0 0 0 0 0
sample_8WT 0 0 0 0 0 0 0.01517765 0
sample_9MT 0 0 0 0 0 0 0 0.00758882
sample_10MT 0 0 0 0.00827872 0 0 0 0
sample_11MT 0 0 0 0 0.04679775 0 0 0

Should I iterate over each row and append the values into a new dataframe?

Thanks

Share Improve this question asked Feb 4 at 19:16 user5029313user5029313 213 bronze badges 2
  • 3 What is the rule for appending either MT or WT to the sample_* values? Using the tidyr package, this will get you some of the way: df |> tidyr::pivot_wider(id_cols = Sample_name, names_from = "Gene_name", values_from = "Gene_fraction", values_fill = 0). Note that "-" is a special character so will will either have to use backticks e.g. `` ` `` when calling your column names, or if possible, replace the "-" with underscores. – L Tyrone Commented Feb 4 at 19:34
  • This worked. Thank you. The WT and MT I added in the main df manually, so don't need add that programatically. – user5029313 Commented Feb 4 at 20:34
Add a comment  | 

2 Answers 2

Reset to default 3

You can simply use xtabs if you don't mind the table format rather than dataframe

> t(xtabs(Gene_fraction ~ ., df))
           Gene_name
Sample_name   IGHV1-11   IGHV1-12   IGHV1-13   IGHV1-15   IGHV1-16   IGHV1-18
  sample_1  0.00057491 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
  sample_10 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00827872
  sample_2  0.00000000 0.00448430 0.00000000 0.00000000 0.00000000 0.00000000
  sample_3  0.00000000 0.00000000 0.00000000 0.01253306 0.00000000 0.00000000
  sample_4  0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00942854
  sample_5  0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
  sample_6  0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
  sample_7  0.00103484 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
  sample_8  0.00000000 0.00000000 0.01517765 0.00000000 0.00000000 0.00000000
  sample_9  0.00000000 0.00000000 0.00000000 0.00000000 0.00758882 0.00000000
           Gene_name
Sample_name   IGHV1-19    IGHV1-2
  sample_1  0.00000000 0.00000000
  sample_10 0.00000000 0.00000000
  sample_2  0.00000000 0.00000000
  sample_3  0.00000000 0.00000000
  sample_4  0.00000000 0.00000000
  sample_5  0.01747729 0.00000000
  sample_6  0.00000000 0.00034495
  sample_7  0.00000000 0.00000000
  sample_8  0.00000000 0.00000000
  sample_9  0.00000000 0.00000000

You can use tidyverse's pivot_wider, base R's reshape or dcast from reshape2 :

# reshape 2
df3 <- reshape2::dcast(df,Sample_name ~ Gene_name,value.var="Gene_fraction",fill=0)

# tidyverse
library(tidyverse)

df1 <- df %>%
  pivot_wider(names_from = Gene_name, values_from = Gene_fraction, values_fill = 0)

# Base R
df2 <- reshape(df, idvar = "Sample_name", timevar = "Gene_name", direction = "wide") # Pivot to wide format
colnames(df2) <- gsub("Gene_fraction.", "", colnames(df2)) # removing Gene_fraction. from cols
df2[is.na(df2)] <- 0

Testdata

df <- data.frame(
  Gene_name = c("IGHV1-11", "IGHV1-12", "IGHV1-15", "IGHV1-18", "IGHV1-19", 
                "IGHV1-2", "IGHV1-11", "IGHV1-13", "IGHV1-16", "IGHV1-18"),
  Sample_name = c("sample_1", "sample_2", "sample_3", "sample_4", "sample_5", 
                  "sample_6", "sample_7", "sample_8", "sample_9", "sample_10"),
  Gene_fraction = c(0.00057491, 0.0044843, 0.01253306, 0.00942854, 0.01747729, 
                    0.00034495, 0.00103484, 0.01517765, 0.00758882, 0.00827872)
)

本文标签: dplyrTake data from one dataframe and make another datsframe in RStack Overflow