admin管理员组文章数量:1122846
I’m analysing the impact of a covariate on a dependent variable using aov
to assess how much variation can be reduced by including said covariate.
However, when I run the function, it automatically uses all available cores on my shared HPC node (~64 cores, as shown in the attached screenshot), which affects other users on the node.
Here is the function along with some data to reproduce the behavior:
explained_variance_aov <- function(pc, data, covar) {
# Create the formula dynamically based on the PC variable
formula <- as.formula(str_interp("${pc} ~ ${covar}"))
# Fit the model
model <- aov(formula, data = data %>% select(c(!!sym(pc), !!sym(covar))) %>% tidyr::drop_na())
# Extract the sum of squares
anova_summary <- model %>% broom::tidy()
SSB <- anova_summary %>%
filter(term == covar) %>%
pull(sumsq) # Between-groups sum of squares
SST <- anova_summary %>%
pull(sumsq) %>%
sum()
R_squared <- (SSB / SST) * 100
# Return the ICC percentage as a named result
return(data.frame(
PC = pc,
explained_var = R_squared
) %>% remove_rownames())
}
n <- 60000
n_pcs <- 3000
data <- as.data.frame(matrix(rnorm(n * n_pcs, mean = 0, sd = 1), nrow = n, ncol = n_pcs))
colnames(data) <- paste0("PC", 1:n_pcs)
data$covar <- factor(sample(c("Group1", "Group2", "Group3"), n, replace = TRUE))
# Example usage of the function for multiple PCs
results <- lapply(paste0("PC", 1:3000), function(pc) {
explained_variance_aov(pc, data, "covar")
}) %>% bind_rows()
# Print results for the first few PCs
print(head(final_results))
I tried changing different default settings, but nothing helped...
Sys.setenv(OMP_NUM_THREADS = 1)
Sys.setenv(MKL_NUM_THREADS = 1)
Sys.setenv(OPENBLAS_NUM_THREADS = 1)
options(mc.cores = 1)
Any idea why this is happening and how to only use a single core for the analysis? I don't want to throttle the worker node for all other users.
I’m analysing the impact of a covariate on a dependent variable using aov
to assess how much variation can be reduced by including said covariate.
However, when I run the function, it automatically uses all available cores on my shared HPC node (~64 cores, as shown in the attached screenshot), which affects other users on the node.
Here is the function along with some data to reproduce the behavior:
explained_variance_aov <- function(pc, data, covar) {
# Create the formula dynamically based on the PC variable
formula <- as.formula(str_interp("${pc} ~ ${covar}"))
# Fit the model
model <- aov(formula, data = data %>% select(c(!!sym(pc), !!sym(covar))) %>% tidyr::drop_na())
# Extract the sum of squares
anova_summary <- model %>% broom::tidy()
SSB <- anova_summary %>%
filter(term == covar) %>%
pull(sumsq) # Between-groups sum of squares
SST <- anova_summary %>%
pull(sumsq) %>%
sum()
R_squared <- (SSB / SST) * 100
# Return the ICC percentage as a named result
return(data.frame(
PC = pc,
explained_var = R_squared
) %>% remove_rownames())
}
n <- 60000
n_pcs <- 3000
data <- as.data.frame(matrix(rnorm(n * n_pcs, mean = 0, sd = 1), nrow = n, ncol = n_pcs))
colnames(data) <- paste0("PC", 1:n_pcs)
data$covar <- factor(sample(c("Group1", "Group2", "Group3"), n, replace = TRUE))
# Example usage of the function for multiple PCs
results <- lapply(paste0("PC", 1:3000), function(pc) {
explained_variance_aov(pc, data, "covar")
}) %>% bind_rows()
# Print results for the first few PCs
print(head(final_results))
I tried changing different default settings, but nothing helped...
Sys.setenv(OMP_NUM_THREADS = 1)
Sys.setenv(MKL_NUM_THREADS = 1)
Sys.setenv(OPENBLAS_NUM_THREADS = 1)
options(mc.cores = 1)
Any idea why this is happening and how to only use a single core for the analysis? I don't want to throttle the worker node for all other users.
Share Improve this question edited Nov 21, 2024 at 16:09 steliosbl 8,9114 gold badges31 silver badges57 bronze badges asked Nov 21, 2024 at 13:24 nhausnhaus 1,02310 silver badges22 bronze badges 02 Answers
Reset to default 2Some of those settings may need to be applied before the R session starts, i.e. Sys.setenv()
from within an R session might not work as expected. Check out the RphcBLASctl package, which offers
blas_set_num_threads(threads)
omp_set_num_threads(threads)
If BLAS threading cannot be controlled and persists with multithreading, consider using explicit parallel processing to ensure proper thread control. For example:
library(parallel)
cl <- makeCluster(1) # Single core
clusterExport(cl, list("explained_variance_aov", "data"))
results <- parLapply(cl, paste0("PC", 1:3000), function(pc) {
explained_variance_aov(pc, data, "covar")
})
stopCluster(cl)
results <- do.call(rbind, results)
本文标签: multiprocessingR stats aov uses all available cores instead of just oneStack Overflow
版权声明:本文标题:multiprocessing - R stats `aov` uses all available cores instead of just one - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736310496a1934397.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论