admin管理员组

文章数量:1392002

Given an index split list T of length M + 1, where the first element is 0 and the last element is N, generate an array D of length N such that D[T[i]:T[i+1]] = i.

For example, given T = [0, 2, 5, 7], then return D = [0, 0, 1, 1, 1, 2, 2].

I'm trying to avoid a for loop, but the best I can do is:

def expand_split_list(split_list):
    return np.concatenate(
        [
            np.full(split_list[i + 1] - split_list[i], i)
            for i in range(len(split_list) - 1)
        ]
    )

Is there a built-in function for that purpose?

Given an index split list T of length M + 1, where the first element is 0 and the last element is N, generate an array D of length N such that D[T[i]:T[i+1]] = i.

For example, given T = [0, 2, 5, 7], then return D = [0, 0, 1, 1, 1, 2, 2].

I'm trying to avoid a for loop, but the best I can do is:

def expand_split_list(split_list):
    return np.concatenate(
        [
            np.full(split_list[i + 1] - split_list[i], i)
            for i in range(len(split_list) - 1)
        ]
    )

Is there a built-in function for that purpose?

Share Improve this question asked Mar 12 at 9:58 LeoLeo 711 silver badge6 bronze badges
Add a comment  | 

3 Answers 3

Reset to default 7

You could combine diff, arange, and repeat:

n = np.diff(T)
out = np.repeat(np.arange(len(n)), n)

As a one-liner (python ≥3.8):

out = np.repeat(np.arange(len(n:=np.diff(T))), n)

Another option with assigning ones to an array of zeros, then cumsum:

out = np.zeros(T[-1], dtype=int)
out[T[1:-1]] = 1
out = np.cumsum(out)

Output:

array([0, 0, 1, 1, 1, 2, 2])

A numpy option is np.searchsorted

np.searchsorted(T, np.arange(max(T)), side='right')-1

which gives

array([0, 0, 1, 1, 1, 2, 2])

Another option (but seems clumsy) is using itertools.accumulate if you don't want to load numpy

from itertools import accumulate
list(accumulate([1 if i in T else 0 for i in range(max(T))], initial=-1))[1:]

and you will obtain a list

[0, 0, 1, 1, 1, 2, 2]

If you want to leverage broadcasting, a different (but not the fastest) numpy approach could be using np.meshgrid.

def expand_split_list(T):
    grid, _ = np.meshgrid(np.arange(T[-1]), T[:-1], indexing="ij") 
    # Creates a grid of indices and boundaries
    return (grid >= T[:-1]).sum(axis=1) - 1 
    # Boolean mask to check segment membership, then sum to assign group   indices

Another numpy approach could be using np.digitize if you want direct binning approach, but it is slightly slower than np.searchsorted() due to monotonicity checks

np.digitize(np.arange(T[-1]), bins=T) - 1

If you're working with Pandas, pd.cut() is another easy way to segment values:

pd.cut(range(T[-1]), bins=T, labels=False, right=False).tolist()

For a pure Python approach, you can use bisect_right(), which performs binary search over T for each element:

from bisect import bisect_right

def expand_split_list(T):
    return [bisect_right(T, i) - 1 for i in range(T[-1])]

本文标签: pythonFast way to expand split list into index listStack Overflow