admin管理员组

文章数量:1405516

I'm developing a .NET library related to MIDI (music). For some of my API I need to find notes at the specified time. For example:

       |
==============
 ===   |
  ===  | ===
   === |  ===
      ===
       |

Here the top and the bottom notes should be selected. The problem is to find a way to search for notes intersecting the specified time quickly. In real life there are a lot of notes and iterating over all of them is obviously a bad approach. I need something better than O(n) if possible.

In fact we have a set of line segments (a note has the start time and the end time) and a point. We need to select segments that contain that point.

Right now notes in my code are already ordered by start times by storing them in a red-black tree. All algorithms I can think about have O(n) complexity.

Is there a way to solve the task efficiently? Maybe for O(logN).

I'm developing a .NET library related to MIDI (music). For some of my API I need to find notes at the specified time. For example:

       |
==============
 ===   |
  ===  | ===
   === |  ===
      ===
       |

Here the top and the bottom notes should be selected. The problem is to find a way to search for notes intersecting the specified time quickly. In real life there are a lot of notes and iterating over all of them is obviously a bad approach. I need something better than O(n) if possible.

In fact we have a set of line segments (a note has the start time and the end time) and a point. We need to select segments that contain that point.

Right now notes in my code are already ordered by start times by storing them in a red-black tree. All algorithms I can think about have O(n) complexity.

Is there a way to solve the task efficiently? Maybe for O(logN).

Share asked Mar 7 at 16:44 MaximMaxim 2,1441 gold badge21 silver badges27 bronze badges 2
  • 3 Interval trees seems suited for that. Alternatively, maybe Segment trees should also be. Note that the optimal complexity should be O(w) where w is the number of output items and in the worst case w=n. – Jérôme Richard Commented Mar 7 at 17:02
  • Do a binary search through your start time ordered line segments. – ravenspoint Commented Mar 7 at 17:06
Add a comment  | 

3 Answers 3

Reset to default 1

Without changing the way you store notes, there is no solution better than O(n) in the worst case, but one option to help make the code faster would be to save the length of the longest note. As you search through the tree, if the start time of a node is before the specified time you are looking for, check if the start time of that node + the longest note length is still before the specified time, and if it is, you can ignore that note and all notes that start before it.

This solution is still O(n), but in practice, it will cut out a significant portion of notes, as you won't consider any nodes that are too early to possibly overlap the specified time.

The efficient way to do this is to find the note ranges that contain the specified point by doing a binary search through the sorted note ranges.

The wrinkle is that, because more than one range can contain the point, you need to place the binary search inside a loop. Each time a range is found, remove that range and search again until all ranges containing the point have been found.

Here is some code that implements this:

#include <string>
#include <iostream>
#include <vector>
#include <algorithm>

typedef std::pair<int, int> range_t;
typedef std::vector<range_t> vRange_t;

vRange_t theRanges;
vRange_t theWork;
vRange_t vFound;

void display(const vRange_t vr)
{
    for (const auto &r : vr)
        std::cout << r.first << "-" << r.second << ", ";
    std::cout << "\n";
}

void generate(int count)
{
    for (int k = 0; k < count; k++)
    {
        int start = rand() % (count * 10);
        theRanges.emplace_back(
            start, start + 1 + rand() % 10);
    }
    std::sort(
        theRanges.begin(), theRanges.end(),
        [](const range_t &a, const range_t &b)
        {
            return a.first < b.first;
        });

    display(theRanges);
}
bool isInRange(int point, int split)
{
    //std::cout << "checking  " << theWork[split].first << "-" << theWork[split].second << "\n";
    if (
        theWork[split].first <= point &&
        point <= theWork[split].second)
    {
        vFound.push_back(theWork[split]);
        theWork.erase(theWork.begin() + split);
        return true;
    }

    return false;
}

int binarySearch(int low, int high, int point)
{
    while (low <= high)
    {
        int mid = low + (high - low) / 2;

        // Check if is present at mid
        if (isInRange(point, mid))
            return true;

        // If  greater, ignore left half
        if (theWork[mid].first < point)
            low = mid + 1;

        // If smaller, ignore right half
        else
            high = mid - 1;
    }

    // If we reach here, then not found
    return false;
}

void find(int point)
{
    theWork = theRanges;

    while (true)
    {
        if (!binarySearch(0, theWork.size() - 1, point))
            break;
    }

    std::cout << "The point " << point << " is in these ranges:\n";
    display(vFound);
}

main()
{
    generate(10);
    find(62);

    return 0;
}

the output is

5-11, 27-34, 34-35, 41-49, 61-63, 62-67, 69-74, 78-87, 81-89, 95-98, 
The point 62 is in these ranges:
61-63, 62-67,

Performance Tests

note count milliseconds
10,000 0.5
100,000 17
200,000 42

The question is missing a performance requirement. So I used this:

Most musical pieces are less than 5 minutes, but lets say an hour ( e.g. Holst's The Planets ). The very shortest note might be 0.1 second long. How many notes can be heard simultaneously? Let's say 5.

 60 * 60 * 10 * 5 = 180,000. 

So, I propose a performance requirement of max 200,000 notes.

Conclusion: Processing Holst's The Planets would require less than a tenth of a second run time.

Complete application code for the performance test at https://codeberg./JamesBremner/InRange/src/branch/main/src/main.cpp

I've implemented an interval tree and it's indeed extremely fast for searching intervals by a point. My main programming language is C# so implemented both an interval tree (let's call it A) and the way suggested by @ravenspoint in the answer (let's call it B), and created benchmarks to see what method is faster.

If shortly, starting from ~1000 elements an interval tree becomes much much faster. Short info:

  1. Benchmarks were executed for N = 1, 10, 100, 1000, 10000 and 100000 intervals. Intervals list is just a collection of fixed-length ranges with fixed gap between them, like this:
    |---|  |---|  |---|  |---| ... |---|
    
  2. For every N three benchmarks were created: search by point falling into the first interval (F), search by point falling into the middle interval (M) and search by point falling into the last interval (L):
    |-F-| ... |---|  |-M-|  |---| ... |-L-|
    
  3. For N < 1000 the B method slightly faster (~ x1.5).
  4. For bigger Ns the A method becomes drastically faster: x2-x4 for N = 1000, x30 for N = 10000 and x1000+ for N = 100000.

Results for N = 100000:

Method Position Mean time (ns) Memory (B)
B First 306,130.45 800191
A First 245.42 184
B Middle 305,530.21 800250
A Middle 309.46 184
B Last 305,110.85 800249
A Last 734.05 184

Also memory usage is obviously high for the B method.

Benchmarks were executed via BenchmarkDotNet library. For memory analyzation please read The new MemoryDiagnoser is now better than ever article.

I've implemented an interval tree via red-black tree augmentation. Please see my code here:

  • Red-black tree: https://github/melanchall/drywetmidi/tree/todo/DryWetMidi/Common/RedBlackTree
  • Interval tree: https://github/melanchall/drywetmidi/tree/todo/DryWetMidi/Common/IntervalTree

Here C# code for the B approach:

internal sealed class SoSearcher<T>
    where T : IInterval<int>
{
    private List<T> _intervals;

    public SoSearcher(params T[] data)
    {
        _intervals = data.OrderBy(d => d.Start).ToList();
    }

    public bool IsInRange(List<T> workingCopy, List<T> result, int point, int split)
    {
        if (workingCopy[split].Start <= point && point <= workingCopy[split].End)
        {
            result.Add(workingCopy[split]);
            workingCopy.RemoveAt(split);
            return true;
        }

        return false;
    }

    public bool BinarySearch(List<T> workingCopy, List<T> result, int low, int high, int point)
    {
        while (low <= high)
        {
            int mid = low + (high - low) / 2;

            if (IsInRange(workingCopy, result, point, mid))
                return true;

            if (workingCopy[mid].Start < point)
                low = mid + 1;
            else
                high = mid - 1;
        }

        return false;
    }

    public ICollection<T> Find(int point)
    {
        var workingCopy = _intervals.ToList();
        var result = new List<T>();

        while (true)
        {
            if (!BinarySearch(workingCopy, result, 0, workingCopy.Count - 1, point))
                break;
        }

        return result;
    }
}

本文标签: algorithmEfficient way to find line segments containing a pointStack Overflow