I have a 200k lines list of number ranges like start_position,stop position.
The list includes all kinds of overlaps in addition to nonoverlapping ones.
the list looks like this
- [3,5]
- [10,30]
- [15,25]
- [5,15]
- [25,35]
- ...
I need to find the ranges that a given number fall in. And will repeat it for 100k numbers.
For example if 18 is the given number with the list above then the function should return
[10,30]
[15,25]
I am doing it in a overly complicated way using bisect, can anybody give a clue on how to do it in a faster way.
Thanks
It seems a range coverage problem. Since you have a large amount queries to process, you need to give the result as quickly as possible. There is an algorithm relating to this problem, you can take a look at the Segment Tree.
The idea is to build a Segment Tree first based on the intervals given, and then for each query, you can do as fast as log(N)
, where N
is the number of intervals.
To get all possible k
intervals, first find the parent node covering all sub intervals with log(n)
, and then traverse its children to retrieve all k intervals. So the time complexity of retrieving all intervals for a given number is bounded by log(N) + O(k)
, where k << n
.
This algorithm could be deteriorated to as slow as O(N)
when all intervals contains the given number. In this case, since the size of output is N
, there does not exist any better solution. Because the complexity of the algorithm must be at least at the same order or higher than the output size.
Hope it helps.