automation - Averaging temporal series with fixed resolution in R - Stack Overflow

IT技术

更新时间：2025-04-141

admin管理员组
文章数量:1386695

I have a number of chromatograms saved as .csv files in a folder that look something like

time <- c(0.001575, 0.008775, 0.015975, 0.023175, 0.030375, 0.037575, 0.044775, 0.051975, 0.059175, 0.066375, 0.073575, 0.080776, 0.087976, 0.095176, 0.102376, 0.109576, 0.116776, 0.123976, 0.131176, 0.138376, 0.145576, 0.152776, 0.159976, 0.167176, 0.174376, 0.181576, 0.188776, 0.195976, 0.203176)

RID <- c(67.36, 66.39, 65.39, 64.41, 63.52, 62.76, 62.16,61.76, 61.54,61.53,61.7,62.05,62.52, 63.09, 63.71, 64.33, 64.92, 65.46, 65.93, 66.32, 66.63, 66.87, 67.05, 67.18, 67.27, 67.32, 67.35, 67.37, 67.38)

dd<- data.frame(time, RID)

what I need to do and I'm struggling to understand how could I possibly do is "reduce" the resolution of the dataset by making averaging the data in bins of a certain resolution e.g. 0.05 meaning turning that dataframe into something like

time	RID
0	average of the RID data between time 0 and 0.05
0.05	average of the RID data between time 0.05 and 0.1
0.1	average of the RID data between time 0.1 and 0.15

I have a number of chromatograms saved as .csv files in a folder that look something like

time <- c(0.001575, 0.008775, 0.015975, 0.023175, 0.030375, 0.037575, 0.044775, 0.051975, 0.059175, 0.066375, 0.073575, 0.080776, 0.087976, 0.095176, 0.102376, 0.109576, 0.116776, 0.123976, 0.131176, 0.138376, 0.145576, 0.152776, 0.159976, 0.167176, 0.174376, 0.181576, 0.188776, 0.195976, 0.203176)

RID <- c(67.36, 66.39, 65.39, 64.41, 63.52, 62.76, 62.16,61.76, 61.54,61.53,61.7,62.05,62.52, 63.09, 63.71, 64.33, 64.92, 65.46, 65.93, 66.32, 66.63, 66.87, 67.05, 67.18, 67.27, 67.32, 67.35, 67.37, 67.38)

dd<- data.frame(time, RID)

what I need to do and I'm struggling to understand how could I possibly do is "reduce" the resolution of the dataset by making averaging the data in bins of a certain resolution e.g. 0.05 meaning turning that dataframe into something like

time	RID
0	average of the RID data between time 0 and 0.05
0.05	average of the RID data between time 0.05 and 0.1
0.1	average of the RID data between time 0.1 and 0.15

and so on.

The only thing I know that comes close to what I want to do is aggregate, but that would imply first creating a dummy time table with the time data cropped to the desired resolution and it feels like there should be a more easily available solution, especially because I have an entire folder of xsv files and I need to automate the entire process for future studies.

Share Improve this question edited Mar 17 at 9:58 zx8754 56.4k12 gold badges126 silver badges226 bronze badges Recognized by R Language Collective asked Mar 17 at 9:48 Raffaello 334 bronze badges

If aggregate is not suitable, maybe use rollmean. – zx8754 Commented Mar 17 at 10:10

Add a comment |

5 Answers 5

Sorted by: Reset to default 4

Create groups, then get mean per group:

aggregate(RID ~ cut(time, seq(0, 1, 0.05)), data = dd, mean)
#   cut(time, seq(0, 1, 0.05))      RID
# 1                   (0,0.05] 64.57000
# 2                 (0.05,0.1] 62.02714
# 3                 (0.1,0.15] 65.32857
# 4                 (0.15,0.2] 67.20143
# 5                 (0.2,0.25] 67.38000

Use findInterval():

n = seq(0, max(dd$time), .05)
tapply(dd$RID, findInterval(dd$time, n), mean) |> setNames(n)

       0     0.05      0.1     0.15      0.2 
64.57000 62.02714 65.32857 67.20143 67.38000

EDIT

For a solution on a bunch of files

l = lapply(list.files(pattern='*.csv$'), read.csv) 
# if possible merge to one data frame else 
m = sapply(l, \(i) max(i[['time']], na.rm=TRUE)) |> max()
# move on with a global m(aximum)

We could reduce the number of lapply.

Probably you can try

> aggregate(RID ~ cbind(time = as.character(cut(time, seq(0, 0.5, 0.05)))), dd, mean)
        time      RID
1   (0,0.05] 64.57000
2 (0.05,0.1] 62.02714
3 (0.1,0.15] 65.32857
4 (0.15,0.2] 67.20143
5 (0.2,0.25] 67.38000

or

> with(dd, by(RID, list(time = cut(time, seq(0, 0.5, 0.05))), mean))
time: (0,0.05]
[1] 64.57
------------------------------------------------------------ 
time: (0.05,0.1]
[1] 62.02714
------------------------------------------------------------
time: (0.1,0.15]
[1] 65.32857
------------------------------------------------------------
time: (0.15,0.2]
[1] 67.20143
------------------------------------------------------------
time: (0.2,0.25]
[1] 67.38
------------------------------------------------------------
time: (0.25,0.3]
[1] NA
------------------------------------------------------------
time: (0.3,0.35]
[1] NA
------------------------------------------------------------
time: (0.35,0.4]
[1] NA
------------------------------------------------------------
time: (0.4,0.45]
[1] NA
------------------------------------------------------------
time: (0.45,0.5]
[1] NA

For a single file

# Define bin width
bin_width <- 0.05

# Create bins using cut()
dd$bin <- cut(dd$time, breaks = seq(0, ceiling(max(dd$time)/bin_width)*bin_width, bin_width), 
              include.lowest = TRUE, right = FALSE)

# Calculate means for each bin
result <- aggregate(RID ~ bin, data = dd, mean)

# Extract the lower bound of each bin as the new time
result$time <- as.numeric(sub("\\[([^,]*),.*", "\\1", result$bin))
result <- result[, c("time", "RID")]

For multiple files

# Set working directory to your folder (or specify full path)
setwd("path/to/your/folder")

# Define bin width
bin_width <- 0.05

# List all CSV files
files <- list.files(pattern = "*.csv")

# Process each file
for (file in files) {
  # Read the CSV
  dd <- read.csv(file)
  
  # Create bins
  dd$bin <- cut(dd$time, breaks = seq(0, ceiling(max(dd$time)/bin_width)*bin_width, bin_width), 
                include.lowest = TRUE, right = FALSE)
  
  # Calculate means
  result <- aggregate(RID ~ bin, data = dd, mean)
  result$time <- as.numeric(sub("\\[([^,]*),.*", "\\1", result$bin))
  result <- result[, c("time", "RID")]
  
  # Save the result (e.g., append "_reduced" to the filename)
  output_file <- sub(".csv", "_reduced.csv", file)
  write.csv(result, output_file, row.names = FALSE)
  
  cat("Processed:", file, "\n")
}

timeplyr now has fixed-width time intervals which can help here

library(dplyr)
library(timeplyr)

dd |> 
  mutate(intv = time_cut_width(time, 0.05, from = 0)) |> 
  summarise(mean = mean(RID), .by = intv)
#> # A tibble: 5 × 2
#>   intv         mean
#>   <tm_ntrvl>  <dbl>
#> 1 [0, 0.05)    64.6
#> 2 [0.05, 0.1)  62.0
#> 3 [0.1, 0.15)  65.3
#> 4 [0.15, 0.2)  67.2
#> 5 [0.2, 0.25)  67.4

^{Created on 2025-03-17 with reprex v2.1.1}

本文标签： automationAveraging temporal series with fixed resolution in RStack Overflow

版权声明：本文标题：automation - Averaging temporal series with fixed resolution in R - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744569102a2613232.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

automation - Averaging temporal series with fixed resolution in R - Stack Overflow

5 Answers 5

更多相关文章

automation - Averaging temporal series with fixed resolution in R - Stack Overflow

发表评论

推荐文章

javascript - Filter a collection group query in FireStore - Stack Overflow

aws cloudformation - AWS SAM: Calling a Lambda Function from an API Gateway asynchronously - Stack Overflow

reactjs - Infinite Reloading of React Native WebView – How to Trace the Trigger? - Stack Overflow

javascript - gulp-filter filters out all files - Stack Overflow

javascript - Discord chat bot change channel post permissions - Stack Overflow

热门文章

python - Efficiently Rasterizing a Shapefile into Slant Range with Sentinel-1 SAR Data - Stack Overflow

javascript - React-Tooltip not displaying until page is refreshed - Stack Overflow

javascript - Serializing and deserializing functions to and from JSON - Stack Overflow

javascript - Regular expression using + problem in js - Stack Overflow

jquery - Javascript Dynamic Function Call with Name - Stack Overflow

javascript - Node.js: Authenticate client using unique public key (Similar to Github SSH key authentication) - Stack Overflow

c - Is a stack overflow on a compiled stack really impossible? - Stack Overflow

java - How to use rocksDb table_options.cache_index_and_filter_blocks = true; in a spring application - Stack Overflow

c# - How to update controls[DataGrid,TextBoxes and Label] based on a row selection made in DataGrid that resideds in a updatePan

fido - Yubikey attestation returns 39 byte authenticator data instead of 37 bytes - Stack Overflow

最新文章

windows设置断电重启开机后自动输入锁屏密码登录

Windows系统设置开机默认开启数字小键盘

Windows11 开机自动同步时间（开机时间不更新问题）

windows配置开机自启动软件或脚本

【Redis】Windows设置Redis为开机自启动

javascript - arrow functions: how to indicate un-needed parameters in destructuring - Stack Overflow

php - how to response only 204 code status with no body message - Stack Overflow

javascript - avoid loading jquery multiple times - Stack Overflow

javascript - How to avoid repeating http requests angularJS - Stack Overflow

c++ - Winapi: Modify DPI scaled standard cursor bitmap - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价