admin管理员组

文章数量:1122846

I’m analysing the impact of a covariate on a dependent variable using aov to assess how much variation can be reduced by including said covariate.

However, when I run the function, it automatically uses all available cores on my shared HPC node (~64 cores, as shown in the attached screenshot), which affects other users on the node.

Here is the function along with some data to reproduce the behavior:

explained_variance_aov <- function(pc, data, covar) {

    # Create the formula dynamically based on the PC variable
    formula <- as.formula(str_interp("${pc} ~ ${covar}"))

    # Fit the model
    model <- aov(formula, data = data %>% select(c(!!sym(pc), !!sym(covar))) %>% tidyr::drop_na())

    # Extract the sum of squares
    anova_summary <- model %>% broom::tidy()
    SSB <- anova_summary %>%
        filter(term == covar) %>%
        pull(sumsq) # Between-groups sum of squares
    SST <- anova_summary %>%
        pull(sumsq) %>%
        sum()

    R_squared <- (SSB / SST) * 100

    # Return the ICC percentage as a named result
    return(data.frame(
        PC = pc,
        explained_var = R_squared
    ) %>% remove_rownames())
}

n <- 60000  
n_pcs <- 3000  

data <- as.data.frame(matrix(rnorm(n * n_pcs, mean = 0, sd = 1), nrow = n, ncol = n_pcs))
colnames(data) <- paste0("PC", 1:n_pcs)
data$covar <- factor(sample(c("Group1", "Group2", "Group3"), n, replace = TRUE))


# Example usage of the function for multiple PCs
results <- lapply(paste0("PC", 1:3000), function(pc) {
    explained_variance_aov(pc, data, "covar")
}) %>% bind_rows()

# Print results for the first few PCs
print(head(final_results))

I tried changing different default settings, but nothing helped...

Sys.setenv(OMP_NUM_THREADS = 1)  
Sys.setenv(MKL_NUM_THREADS = 1)  
Sys.setenv(OPENBLAS_NUM_THREADS = 1)  
options(mc.cores = 1)

Any idea why this is happening and how to only use a single core for the analysis? I don't want to throttle the worker node for all other users.

I’m analysing the impact of a covariate on a dependent variable using aov to assess how much variation can be reduced by including said covariate.

However, when I run the function, it automatically uses all available cores on my shared HPC node (~64 cores, as shown in the attached screenshot), which affects other users on the node.

Here is the function along with some data to reproduce the behavior:

explained_variance_aov <- function(pc, data, covar) {

    # Create the formula dynamically based on the PC variable
    formula <- as.formula(str_interp("${pc} ~ ${covar}"))

    # Fit the model
    model <- aov(formula, data = data %>% select(c(!!sym(pc), !!sym(covar))) %>% tidyr::drop_na())

    # Extract the sum of squares
    anova_summary <- model %>% broom::tidy()
    SSB <- anova_summary %>%
        filter(term == covar) %>%
        pull(sumsq) # Between-groups sum of squares
    SST <- anova_summary %>%
        pull(sumsq) %>%
        sum()

    R_squared <- (SSB / SST) * 100

    # Return the ICC percentage as a named result
    return(data.frame(
        PC = pc,
        explained_var = R_squared
    ) %>% remove_rownames())
}

n <- 60000  
n_pcs <- 3000  

data <- as.data.frame(matrix(rnorm(n * n_pcs, mean = 0, sd = 1), nrow = n, ncol = n_pcs))
colnames(data) <- paste0("PC", 1:n_pcs)
data$covar <- factor(sample(c("Group1", "Group2", "Group3"), n, replace = TRUE))


# Example usage of the function for multiple PCs
results <- lapply(paste0("PC", 1:3000), function(pc) {
    explained_variance_aov(pc, data, "covar")
}) %>% bind_rows()

# Print results for the first few PCs
print(head(final_results))

I tried changing different default settings, but nothing helped...

Sys.setenv(OMP_NUM_THREADS = 1)  
Sys.setenv(MKL_NUM_THREADS = 1)  
Sys.setenv(OPENBLAS_NUM_THREADS = 1)  
options(mc.cores = 1)

Any idea why this is happening and how to only use a single core for the analysis? I don't want to throttle the worker node for all other users.

Share Improve this question edited Nov 21, 2024 at 16:09 steliosbl 8,9114 gold badges31 silver badges57 bronze badges asked Nov 21, 2024 at 13:24 nhausnhaus 1,02310 silver badges22 bronze badges 0
Add a comment  | 

2 Answers 2

Reset to default 2

Some of those settings may need to be applied before the R session starts, i.e. Sys.setenv() from within an R session might not work as expected. Check out the RphcBLASctl package, which offers

blas_set_num_threads(threads)
omp_set_num_threads(threads)

If BLAS threading cannot be controlled and persists with multithreading, consider using explicit parallel processing to ensure proper thread control. For example:

library(parallel)

cl <- makeCluster(1)  # Single core

clusterExport(cl, list("explained_variance_aov", "data"))

results <- parLapply(cl, paste0("PC", 1:3000), function(pc) {
    explained_variance_aov(pc, data, "covar")
})

stopCluster(cl)

results <- do.call(rbind, results)

本文标签: multiprocessingR stats aov uses all available cores instead of just oneStack Overflow