admin管理员组

文章数量:1125090

I am looking to run terra::as.matrix on a somewhat large raster stack (70 gigs). Getting a std::bad_alloc memory error immediately. I see somewhat similar posted questions (e.g., #562 on the terra github repo and elsewhere here), with various potential solutions, but am uncertain which path is most appropriate for my situation. I do not have experience splitting rasters into chunks or sections, and processing each separately. Hoping there's a way to configure terra for dealing with the issue.

I tried changing terraOptions() including, E.g., terraOptions(memfrac=0.9) and terraOptions(steps=55).

The raster requires more RAM to process that I have; I boosted the proportion from 0.6 to 0.9. See:

mem_info(x)

------------------------
Memory (GB) 
------------------------
check threshold : 1 (memmin)
available       : 17.45
allowed (90%)   : 15.7
needed (n=1)    : 136.66
------------------------
proc in memory  : FALSE
nr chunks       : 10
------------------------

The spatRaster object is comprised of 8 layers; two for coordinates, the others values from 6 constituent raster layers.

class       : SpatRaster 
dimensions  : 42700, 53693, 8  (nrow, ncol, nlyr)
resolution  : 10, 10  (x, y)
extent      : 228888.5, 765818.5, 4807436, 5234436  (xmin, xmax, ymin, ymax)
coord. ref. : NAD_1983_CSRS_v6_UTM_Zone_20N 
source      : spat_6d442d83216d_27972.tif 
names       :       xCoord,       yCoord,       chm,      ndvi,       hwba,     swba, ... 
min values  : 5.247202e-07, 5.247202e-07, 0.0000000, 0.0000000, 0.00000000, 0.000000, ... 
max values  : 1.104672e-02, 8.504140e-03, 0.1438089, 0.1625947, 0.07315826, 0.522824, ... 

The issue is related to Error std::bad_alloc using `terra::extract` on large stack and many points and operation on a very large raster in terra causes std::bad_alloc but I couldn't find a solution there to solve my issue.

Here's a reprex of my code:

library(terra)

terraOptions(memfrac=0.9)
#terraOptions(steps = 55)

Trans.env.table <- terra::as.matrix(x)

mem_info(x)

I do see that proc in memory is defaulting to FALSE.

The code chunk I'm looking to adapt is as follows:

# transRasts = a raster stack of GDM-transformed layers
# put the values from the transformed layers in a table for easy analysis  
Trans.env.table <- as.matrix(transRasts)  
col.longs<-xFromCol(transRasts)  
row.lats<-yFromRow(transRasts)  
Cell_Long<-rep(col.longs, times=nrow(transRasts))  
Cell_Lat<-rep(row.lats, each=ncol(transRasts), times=1)  
Trans.env.table<-cbind(Cell_Long, Cell_Lat, Trans.env.table)  
Trans.env.table <- Trans.env.table[complete.cases(Trans.env.table),]  
# specify the number of random samples of grid cells to use in the clustering proceedure
n.sub <- 500
# specify the number of community types to derive
n.cat <- 100

# Then take a random sample of grid cells from the transformed environment data
sub.Trans.env <- Trans.env.table[sample(nrow(Trans.env.table), n.sub),]
# Then loop through and determine the predicted dissimilarity between each pair of
# cells in the random set

sub.dissimilarity <- matrix(0, n.sub, n.sub)
colnames(sub.dissimilarity)<-c(1:n.sub)
rownames(sub.dissimilarity)<-c(1:n.sub)
for(i.col in 1:(n.sub-1))
{
for(i.row in (i.col+1):n.sub)
{
ecol.dist <- sum(abs(sub.Trans.env[i.col,c(3:ncol(sub.Trans.env))] -
sub.Trans.env[i.row,c(3:ncol(sub.Trans.env))]))
sub.dissimilarity[i.row,i.col] <- 1 - exp(-1 * (gdmRastMod$intercept + ecol.dist))
sub.dissimilarity[i.col,i.row] <- sub.dissimilarity[i.row,i.col]
} # end for i.row
} # end for i.col
# Now apply heirachical clustering to the subsample dissimilarity matrix
sub.dissimilarity<-as.dist(sub.dissimilarity)
class.results<-hclust(sub.dissimilarity, method = "ward.D")
class.membership <- cutree(class.results, k = n.cat)
# Now run through all grid cells, and allocate them to the class of the
# most similar cell in the training set
# takes 5 mins with 500 samples
cell.class <- rep(1, length=nrow(Trans.env.table))
for(i.cell in 1:nrow(Trans.env.table))
{
max.similarity <- 0
i.cell.class <- 1
for(i.sub in 1:n.sub)
{
ecol.dist <- sum(abs(Trans.env.table[i.cell,c(3:ncol(Trans.env.table))] -
sub.Trans.env[i.sub,c(3:ncol(sub.Trans.env))]))
similarity <- exp(-1 * (gdmRastMod$intercept + ecol.dist))
if(similarity > max.similarity)
{
max.similarity <- similarity
i.cell.class <- class.membership[i.sub]
} # end if
25
} # end for i.sub
cell.class[i.cell] <- i.cell.class
} # end for i.cell

# Convert the results to a raster
gdm.class.ras <- raster(transRasts,layer=1)
gdm.class.ras <- rasterize(Trans.env.table[,c(1:2)],
gdm.class.ras,
field=cell.class)
# Plot the community classes ~~~~~~~~~~~~~
plot(gdm.class.ras)

I am looking to run terra::as.matrix on a somewhat large raster stack (70 gigs). Getting a std::bad_alloc memory error immediately. I see somewhat similar posted questions (e.g., #562 on the terra github repo and elsewhere here), with various potential solutions, but am uncertain which path is most appropriate for my situation. I do not have experience splitting rasters into chunks or sections, and processing each separately. Hoping there's a way to configure terra for dealing with the issue.

I tried changing terraOptions() including, E.g., terraOptions(memfrac=0.9) and terraOptions(steps=55).

The raster requires more RAM to process that I have; I boosted the proportion from 0.6 to 0.9. See:

mem_info(x)

------------------------
Memory (GB) 
------------------------
check threshold : 1 (memmin)
available       : 17.45
allowed (90%)   : 15.7
needed (n=1)    : 136.66
------------------------
proc in memory  : FALSE
nr chunks       : 10
------------------------

The spatRaster object is comprised of 8 layers; two for coordinates, the others values from 6 constituent raster layers.

class       : SpatRaster 
dimensions  : 42700, 53693, 8  (nrow, ncol, nlyr)
resolution  : 10, 10  (x, y)
extent      : 228888.5, 765818.5, 4807436, 5234436  (xmin, xmax, ymin, ymax)
coord. ref. : NAD_1983_CSRS_v6_UTM_Zone_20N 
source      : spat_6d442d83216d_27972.tif 
names       :       xCoord,       yCoord,       chm,      ndvi,       hwba,     swba, ... 
min values  : 5.247202e-07, 5.247202e-07, 0.0000000, 0.0000000, 0.00000000, 0.000000, ... 
max values  : 1.104672e-02, 8.504140e-03, 0.1438089, 0.1625947, 0.07315826, 0.522824, ... 

The issue is related to Error std::bad_alloc using `terra::extract` on large stack and many points and operation on a very large raster in terra causes std::bad_alloc but I couldn't find a solution there to solve my issue.

Here's a reprex of my code:

library(terra)

terraOptions(memfrac=0.9)
#terraOptions(steps = 55)

Trans.env.table <- terra::as.matrix(x)

mem_info(x)

I do see that proc in memory is defaulting to FALSE.

The code chunk I'm looking to adapt is as follows:

# transRasts = a raster stack of GDM-transformed layers
# put the values from the transformed layers in a table for easy analysis  
Trans.env.table <- as.matrix(transRasts)  
col.longs<-xFromCol(transRasts)  
row.lats<-yFromRow(transRasts)  
Cell_Long<-rep(col.longs, times=nrow(transRasts))  
Cell_Lat<-rep(row.lats, each=ncol(transRasts), times=1)  
Trans.env.table<-cbind(Cell_Long, Cell_Lat, Trans.env.table)  
Trans.env.table <- Trans.env.table[complete.cases(Trans.env.table),]  
# specify the number of random samples of grid cells to use in the clustering proceedure
n.sub <- 500
# specify the number of community types to derive
n.cat <- 100

# Then take a random sample of grid cells from the transformed environment data
sub.Trans.env <- Trans.env.table[sample(nrow(Trans.env.table), n.sub),]
# Then loop through and determine the predicted dissimilarity between each pair of
# cells in the random set

sub.dissimilarity <- matrix(0, n.sub, n.sub)
colnames(sub.dissimilarity)<-c(1:n.sub)
rownames(sub.dissimilarity)<-c(1:n.sub)
for(i.col in 1:(n.sub-1))
{
for(i.row in (i.col+1):n.sub)
{
ecol.dist <- sum(abs(sub.Trans.env[i.col,c(3:ncol(sub.Trans.env))] -
sub.Trans.env[i.row,c(3:ncol(sub.Trans.env))]))
sub.dissimilarity[i.row,i.col] <- 1 - exp(-1 * (gdmRastMod$intercept + ecol.dist))
sub.dissimilarity[i.col,i.row] <- sub.dissimilarity[i.row,i.col]
} # end for i.row
} # end for i.col
# Now apply heirachical clustering to the subsample dissimilarity matrix
sub.dissimilarity<-as.dist(sub.dissimilarity)
class.results<-hclust(sub.dissimilarity, method = "ward.D")
class.membership <- cutree(class.results, k = n.cat)
# Now run through all grid cells, and allocate them to the class of the
# most similar cell in the training set
# takes 5 mins with 500 samples
cell.class <- rep(1, length=nrow(Trans.env.table))
for(i.cell in 1:nrow(Trans.env.table))
{
max.similarity <- 0
i.cell.class <- 1
for(i.sub in 1:n.sub)
{
ecol.dist <- sum(abs(Trans.env.table[i.cell,c(3:ncol(Trans.env.table))] -
sub.Trans.env[i.sub,c(3:ncol(sub.Trans.env))]))
similarity <- exp(-1 * (gdmRastMod$intercept + ecol.dist))
if(similarity > max.similarity)
{
max.similarity <- similarity
i.cell.class <- class.membership[i.sub]
} # end if
25
} # end for i.sub
cell.class[i.cell] <- i.cell.class
} # end for i.cell

# Convert the results to a raster
gdm.class.ras <- raster(transRasts,layer=1)
gdm.class.ras <- rasterize(Trans.env.table[,c(1:2)],
gdm.class.ras,
field=cell.class)
# Plot the community classes ~~~~~~~~~~~~~
plot(gdm.class.ras)

Share Improve this question edited yesterday Sean Basquill asked 2 days ago Sean BasquillSean Basquill 12 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

mem_info(x) suggest that you have 17.5 GB of memory available, but that you need 137 GB to read the entire file into memory (the file size on disk is not the same, because of compression). So you cannot do that.

The need is computed as the number of cells times 8 (for a double precision numeric value)

42700 * 53693 * 8 * 8 / 2^30
#[1] 136.655

Perhaps the more important question is why you think you need as.matrix at all.

本文标签: rError stdbadalloc using terraasmatrix on larger SpatRasterStack Overflow