admin管理员组

文章数量:1208153

Using the small example below, I'm seeking to aggregate (sum) the values in the 3D dat_arr array using two other arrays to guide the grouping. The first index of dat_arr is related to time. The second and third indices are related to spatial (X, Y) locations. How can I sum values in dat_arr such that the temporal binning is guided by the contents of tim_idx (same length as the first dimension of dat_arr) and the spatial binning uses zon_arr (has same dimensions as the last two indices of dat_arr)?

import numpy as np
import matplotlib.pyplot as plt

zon_arr = np.zeros((3,5))
tim_idx = np.array([0,0,1,1,2,2,3,3])

# set up arbitrary zones
zon_arr[1, :3] = 1
zon_arr[1, 3:] = 2

# plt.imshow(zon_arr)
# plt.show()

# generate arbitrary array with data
# first index = time; last 2 indices represent X-Y pts in space
# last two indices must have same dims as zon_arr
np.random.seed(100)
dat_arr = np.random.rand(8, 3, 5)

So the output I'm expecting would give me the sum of values contained in dat_arr for each unique value in tim_idx and zon_arr. In other words, I would expect output that has 3 value (corresponding to 3 zones) for each of the 4 unique time values in tim_idx?

Using the small example below, I'm seeking to aggregate (sum) the values in the 3D dat_arr array using two other arrays to guide the grouping. The first index of dat_arr is related to time. The second and third indices are related to spatial (X, Y) locations. How can I sum values in dat_arr such that the temporal binning is guided by the contents of tim_idx (same length as the first dimension of dat_arr) and the spatial binning uses zon_arr (has same dimensions as the last two indices of dat_arr)?

import numpy as np
import matplotlib.pyplot as plt

zon_arr = np.zeros((3,5))
tim_idx = np.array([0,0,1,1,2,2,3,3])

# set up arbitrary zones
zon_arr[1, :3] = 1
zon_arr[1, 3:] = 2

# plt.imshow(zon_arr)
# plt.show()

# generate arbitrary array with data
# first index = time; last 2 indices represent X-Y pts in space
# last two indices must have same dims as zon_arr
np.random.seed(100)
dat_arr = np.random.rand(8, 3, 5)

So the output I'm expecting would give me the sum of values contained in dat_arr for each unique value in tim_idx and zon_arr. In other words, I would expect output that has 3 value (corresponding to 3 zones) for each of the 4 unique time values in tim_idx?

Share Improve this question edited Jan 21 at 20:17 user2256085 asked Jan 20 at 16:43 user2256085user2256085 4694 silver badges17 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 2 +50

First, let's compute this with a loop to get a sense of the potential output:

sums = {}
# for each combination of coordinates
for i in range(len(tim_idx)):
    for j in range(zon_arr.shape[0]):
        for k in range(zon_arr.shape[1]):
            # add the value to the (time, zone) key combination
            key = (tim_idx[i], zon_arr[j, k])
            sums[key] = sums.get(key, 0) + dat_arr[i, j, k]

which gives us:

{(0, 0): 8.204124414317336,
 (0, 1): 3.8075543426771645,
 (0, 2): 1.2233223229754382,
 (1, 0): 7.920231812858928,
 (1, 1): 4.150642040307019,
 (1, 2): 2.4211020016615836,
 (2, 0): 10.363684964675313,
 (2, 1): 3.06163710842573,
 (2, 2): 1.9547272492467518,
 (3, 0): 10.841595367423158,
 (3, 1): 2.6617183569891893,
 (3, 2): 2.0222766813453674}

Now we can leverage numpy indexing to perform the same thing in a vectorial way. meshgrid to generate the indexer, unique to get the unique combinations, and bincount to compute the sum per group:

# create the indexer from the combination of time/zone
i, j = np.meshgrid(tim_idx, zon_arr, indexing='ij')
coord = np.c_[i.ravel(), j.ravel()]
# alternatively
# coord = np.c_[np.repeat(tim_idx, zon_arr.size),
#               np.tile(zon_arr.flat, len(tim_idx))]

# identify the unique combinations for later aggregation
keys, idx = np.unique(coord, return_inverse=True, axis=0)

# compute the counts per key
sums = np.bincount(idx, dat_arr.ravel())

Output:

# keys
array([[0, 0],
       [0, 1],
       [0, 2],
       [1, 0],
       [1, 1],
       [1, 2],
       [2, 0],
       [2, 1],
       [2, 2],
       [3, 0],
       [3, 1],
       [3, 2]])

# sums
array([ 8.20412441,  3.80755434,  1.22332232,  7.92023181,  4.15064204,
        2.421102  , 10.36368496,  3.06163711,  1.95472725, 10.84159537,
        2.66171836,  2.02227668])

本文标签: pythonaggregate 3D array using zone and time index arraysStack Overflow