admin管理员组

文章数量:1192555

I have sorted start indices (included) and end indices (excluded) of intervals (obtained by using seachsorted), for instance:

import numpy as np

# Both arrays are of same size, and sorted.
# Size of arrays is number of intervals.
# Intervals do not overlap.

# interval indices:                0  1  2  3  4  5
interval_start_idxs    = np.array([0, 3, 3, 3, 6, 7])
interval_end_excl_idxs = np.array([2, 4, 4, 4, 7, 9])

An empty interval is identified when

interval_start_idxs[interval_idx] == interval_end_excl_idxs[interval_idx]-1

I would like to identify the starts and ends of each region where intervals are empty. A region is made with one or several intervals sharing the same start indices and end excluded indices.

With previous data, expected result would then be:

empty_interval_starts     = [1, 4] # start is included
empty_intervals_ends_excl = [4, 5] # end is excluded

This result is to be understood as:

  • intervals from index 1 to 3, these intervals are a same region of empty intervals
  • and interval at index 4 is a separate region on its own

I have sorted start indices (included) and end indices (excluded) of intervals (obtained by using seachsorted), for instance:

import numpy as np

# Both arrays are of same size, and sorted.
# Size of arrays is number of intervals.
# Intervals do not overlap.

# interval indices:                0  1  2  3  4  5
interval_start_idxs    = np.array([0, 3, 3, 3, 6, 7])
interval_end_excl_idxs = np.array([2, 4, 4, 4, 7, 9])

An empty interval is identified when

interval_start_idxs[interval_idx] == interval_end_excl_idxs[interval_idx]-1

I would like to identify the starts and ends of each region where intervals are empty. A region is made with one or several intervals sharing the same start indices and end excluded indices.

With previous data, expected result would then be:

empty_interval_starts     = [1, 4] # start is included
empty_intervals_ends_excl = [4, 5] # end is excluded

This result is to be understood as:

  • intervals from index 1 to 3, these intervals are a same region of empty intervals
  • and interval at index 4 is a separate region on its own
Share Improve this question edited Jan 24 at 12:57 pierre_j asked Jan 24 at 8:34 pierre_jpierre_j 9832 gold badges15 silver badges33 bronze badges 3
  • 1 Considering the start index is included and the other is excluded, this is a weird definition of "empty" intervals. It is usually start == end. I expect a "3 to 4" range to be a range including 3 and excluding 4 as you seems to indicate just before in the sentence "start indices (included) and end indices (excluded)". Besides, can you explain more precisely how you end up to the final result for the example? – Jérôme Richard Commented Jan 24 at 8:59
  • Hi Jérôme Richard, I am not sure to understand your question. The final result is the one I am looking for. Because intervals 1, 2, 3 are empty, and they share the same start (included & end (excluded), they are a same region, noted in the expected results from indices 1 to 4 (4 is excluded). If there is a single empty interval (like interval at index 4), then it is a region in itself and its end (excluded as well) is 5. – pierre_j Commented Jan 24 at 12:53
  • 1 Jérôme Richard, I have tried to clarify this in the main question: - identifying clearly what interval indices are in the input data, and adding explanations about the results. – pierre_j Commented Jan 24 at 12:59
Add a comment  | 

1 Answer 1

Reset to default 2
import numpy as np

interval_start_idxs    = np.array([0, 3, 3, 3, 6, 7])
interval_end_excl_idxs = np.array([2, 4, 4, 4, 7, 9])

is_region_start = np.r_[True, np.diff(interval_start_idxs) != 0]
is_region_end = np.roll(is_region_start, -1)
is_empty = (interval_start_idxs == interval_end_excl_idxs - 1)

empty_interval_starts = np.nonzero(is_region_start & is_empty)[0]
empty_interval_ends_excl = np.nonzero(is_region_end & is_empty)[0] + 1

Explanation:

  • is_region_start marks the starts of all potential regions, i.e. indices where the current index differs from its predecessor
  • the index of the end of a potential region is right before the start of a new region, which is why we roll back all markers in is_region_start by one to get is_region_end; the rollover in the roll-back from index 0 to index -1 works in our favor here: the marker, previously at index 0, which is always True, used to mark the start of the first potential region in is_region_start and now marks the end of the last potential region in is_region_end
  • is_empty marks all indices that are actually empty, according to your definition
  • empty_interval_starts is the combination of two criteria: start of a potential region and actually being empty (since np.nonzero() returns tuples, we need to extract the first element, …[0], to get to the actual array of indices)
  • empty_interval_ends_excl, likewise, is the combination of two criteria: end of a potential region and actually being empty; however, since empty_interval_ends_excl should be exclusive, we need to add 1 to get the final result

At present, the results (empty_interval_starts and empty_interval_ends_excl) are Numpy arrays. If you prefer them as lists, as written in the question, you might want to convert them with empty_interval_starts.tolist() and empty_interval_ends_excl.tolist(), respectively.

本文标签: pythonHow to get start indices of regions of empty intervalsStack Overflow