admin管理员组文章数量:1405516
I'm developing a .NET library related to MIDI (music). For some of my API I need to find notes at the specified time. For example:
|
==============
=== |
=== | ===
=== | ===
===
|
Here the top and the bottom notes should be selected. The problem is to find a way to search for notes intersecting the specified time quickly. In real life there are a lot of notes and iterating over all of them is obviously a bad approach. I need something better than O(n)
if possible.
In fact we have a set of line segments (a note has the start time and the end time) and a point. We need to select segments that contain that point.
Right now notes in my code are already ordered by start times by storing them in a red-black tree. All algorithms I can think about have O(n)
complexity.
Is there a way to solve the task efficiently? Maybe for O(logN)
.
I'm developing a .NET library related to MIDI (music). For some of my API I need to find notes at the specified time. For example:
|
==============
=== |
=== | ===
=== | ===
===
|
Here the top and the bottom notes should be selected. The problem is to find a way to search for notes intersecting the specified time quickly. In real life there are a lot of notes and iterating over all of them is obviously a bad approach. I need something better than O(n)
if possible.
In fact we have a set of line segments (a note has the start time and the end time) and a point. We need to select segments that contain that point.
Right now notes in my code are already ordered by start times by storing them in a red-black tree. All algorithms I can think about have O(n)
complexity.
Is there a way to solve the task efficiently? Maybe for O(logN)
.
3 Answers
Reset to default 1Without changing the way you store notes, there is no solution better than O(n)
in the worst case, but one option to help make the code faster would be to save the length of the longest note. As you search through the tree, if the start time of a node is before the specified time you are looking for, check if the start time of that node + the longest note length is still before the specified time, and if it is, you can ignore that note and all notes that start before it.
This solution is still O(n)
, but in practice, it will cut out a significant portion of notes, as you won't consider any nodes that are too early to possibly overlap the specified time.
The efficient way to do this is to find the note ranges that contain the specified point by doing a binary search through the sorted note ranges.
The wrinkle is that, because more than one range can contain the point, you need to place the binary search inside a loop. Each time a range is found, remove that range and search again until all ranges containing the point have been found.
Here is some code that implements this:
#include <string>
#include <iostream>
#include <vector>
#include <algorithm>
typedef std::pair<int, int> range_t;
typedef std::vector<range_t> vRange_t;
vRange_t theRanges;
vRange_t theWork;
vRange_t vFound;
void display(const vRange_t vr)
{
for (const auto &r : vr)
std::cout << r.first << "-" << r.second << ", ";
std::cout << "\n";
}
void generate(int count)
{
for (int k = 0; k < count; k++)
{
int start = rand() % (count * 10);
theRanges.emplace_back(
start, start + 1 + rand() % 10);
}
std::sort(
theRanges.begin(), theRanges.end(),
[](const range_t &a, const range_t &b)
{
return a.first < b.first;
});
display(theRanges);
}
bool isInRange(int point, int split)
{
//std::cout << "checking " << theWork[split].first << "-" << theWork[split].second << "\n";
if (
theWork[split].first <= point &&
point <= theWork[split].second)
{
vFound.push_back(theWork[split]);
theWork.erase(theWork.begin() + split);
return true;
}
return false;
}
int binarySearch(int low, int high, int point)
{
while (low <= high)
{
int mid = low + (high - low) / 2;
// Check if is present at mid
if (isInRange(point, mid))
return true;
// If greater, ignore left half
if (theWork[mid].first < point)
low = mid + 1;
// If smaller, ignore right half
else
high = mid - 1;
}
// If we reach here, then not found
return false;
}
void find(int point)
{
theWork = theRanges;
while (true)
{
if (!binarySearch(0, theWork.size() - 1, point))
break;
}
std::cout << "The point " << point << " is in these ranges:\n";
display(vFound);
}
main()
{
generate(10);
find(62);
return 0;
}
the output is
5-11, 27-34, 34-35, 41-49, 61-63, 62-67, 69-74, 78-87, 81-89, 95-98,
The point 62 is in these ranges:
61-63, 62-67,
Performance Tests
note count | milliseconds |
---|---|
10,000 | 0.5 |
100,000 | 17 |
200,000 | 42 |
The question is missing a performance requirement. So I used this:
Most musical pieces are less than 5 minutes, but lets say an hour ( e.g. Holst's The Planets ). The very shortest note might be 0.1 second long. How many notes can be heard simultaneously? Let's say 5.
60 * 60 * 10 * 5 = 180,000.
So, I propose a performance requirement of max 200,000 notes.
Conclusion: Processing Holst's The Planets would require less than a tenth of a second run time.
Complete application code for the performance test at https://codeberg./JamesBremner/InRange/src/branch/main/src/main.cpp
I've implemented an interval tree and it's indeed extremely fast for searching intervals by a point. My main programming language is C# so implemented both an interval tree (let's call it A) and the way suggested by @ravenspoint in the answer (let's call it B), and created benchmarks to see what method is faster.
If shortly, starting from ~1000 elements an interval tree becomes much much faster. Short info:
- Benchmarks were executed for
N
=1
,10
,100
,1000
,10000
and100000
intervals. Intervals list is just a collection of fixed-length ranges with fixed gap between them, like this:|---| |---| |---| |---| ... |---|
- For every
N
three benchmarks were created: search by point falling into the first interval (F), search by point falling into the middle interval (M) and search by point falling into the last interval (L):|-F-| ... |---| |-M-| |---| ... |-L-|
- For
N
<1000
the B method slightly faster (~ x1.5). - For bigger
N
s the A method becomes drastically faster: x2-x4 forN
=1000
, x30 forN
=10000
and x1000+ forN
=100000
.
Results for N
= 100000
:
Method | Position | Mean time (ns) | Memory (B) |
---|---|---|---|
B | First | 306,130.45 | 800191 |
A | First | 245.42 | 184 |
B | Middle | 305,530.21 | 800250 |
A | Middle | 309.46 | 184 |
B | Last | 305,110.85 | 800249 |
A | Last | 734.05 | 184 |
Also memory usage is obviously high for the B method.
Benchmarks were executed via BenchmarkDotNet library. For memory analyzation please read The new MemoryDiagnoser is now better than ever article.
I've implemented an interval tree via red-black tree augmentation. Please see my code here:
- Red-black tree: https://github/melanchall/drywetmidi/tree/todo/DryWetMidi/Common/RedBlackTree
- Interval tree: https://github/melanchall/drywetmidi/tree/todo/DryWetMidi/Common/IntervalTree
Here C# code for the B approach:
internal sealed class SoSearcher<T>
where T : IInterval<int>
{
private List<T> _intervals;
public SoSearcher(params T[] data)
{
_intervals = data.OrderBy(d => d.Start).ToList();
}
public bool IsInRange(List<T> workingCopy, List<T> result, int point, int split)
{
if (workingCopy[split].Start <= point && point <= workingCopy[split].End)
{
result.Add(workingCopy[split]);
workingCopy.RemoveAt(split);
return true;
}
return false;
}
public bool BinarySearch(List<T> workingCopy, List<T> result, int low, int high, int point)
{
while (low <= high)
{
int mid = low + (high - low) / 2;
if (IsInRange(workingCopy, result, point, mid))
return true;
if (workingCopy[mid].Start < point)
low = mid + 1;
else
high = mid - 1;
}
return false;
}
public ICollection<T> Find(int point)
{
var workingCopy = _intervals.ToList();
var result = new List<T>();
while (true)
{
if (!BinarySearch(workingCopy, result, 0, workingCopy.Count - 1, point))
break;
}
return result;
}
}
本文标签: algorithmEfficient way to find line segments containing a pointStack Overflow
版权声明:本文标题:algorithm - Efficient way to find line segments containing a point - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744917391a2632077.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
O(w)
wherew
is the number of output items and in the worst casew=n
. – Jérôme Richard Commented Mar 7 at 17:02