Question 1

I have been wondering if there is some kind of data-structure or clever way to use a dictionary (O(1) lookup) to return a value if there are given values for defined ranges that do not overlap. So far I have been thinking this could be done if the ranges have some constant difference (0-2, 2-4, 4-6, etc.) or a binary-search could be done to do this in O(log(n)) time.

So, for example given a dictionary,

d = {[0.0 - 0.1): "a",[0.1 - 0.3): "b",[0.3 - 0.55): "c",[0.55 - 0.7): "d",[0.7 - 1.0): "e"}

it should return,

d[0.05] 
>>> "a"
d[0.8]
>>> "e"
d[0.9]
>>> "e"
d[random.random()] # this should also work

Is there anyway to achieve something like this? Thanks for any responses or answers on this.

Question 2

You can have O(1) lookup time if you accept a low(er) resolution of range boundaries and sacrifice memory for lookup speed.

A dictionary can do a lookup in O(1) average time because there is a simple arithmetic relationship between key and location in a fixed-size data structure (hash(key) % tablesize, for the average case). Your ranges are effectively of a variable size with floating-point boundaries, so there is no fixed tablesize to map a search value to.

Unless, that is, you limit the absolute lower and upper boundaries of the ranges, and let range boundaries fall on a fixed step size. Your example uses values from 0.0 through to 1.0, and the ranges can be quantized to 0.05 steps. That can be turned into a fixed table:

import math
from collections import MutableMapping# empty slot marker
_EMPTY = object()class RangeMap(MutableMapping):"""Map points to values, and find values for points in O(1) constant timeThe map requires a fixed minimum lower and maximum upper bound forthe ranges. Range boundaries are quantized to a fixed step size. Gapsare permitted, when setting overlapping ranges last range set wins."""def __init__(self, map=None, lower=0.0, upper=1.0, step=0.05):self._mag = 10 ** -round(math.log10(step) - 1)  # shift to integersself._lower, self._upper = round(lower * self._mag), round(upper * self._mag)self._step = round(step * self._mag)self._steps = (self._upper - self._lower) // self._stepself._table = [_EMPTY] * self._stepsself._len = 0if map is not None:self.update(map)def __len__(self):return self._lendef _map_range(self, r):low, high = rstart = round(low * self._mag) // self._stepstop = round(high * self._mag) // self._stepif not self._lower <= start < stop <= self._upper:raise IndexError('Range outside of map boundaries')return range(start - self._lower, stop - self._lower)def __setitem__(self, r, value):for i in self._map_range(r):self._len += int(self._table[i] is _EMPTY)self._table[i] = valuedef __delitem__(self, r):for i in self._map_range(r):self._len -= int(self._table[i] is not _EMPTY)self._table[i] = _EMPTYdef _point_to_index(self, point):point = round(point * self._mag)if not self._lower <= point <= self._upper:raise IndexError('Point outside of map boundaries')return (point - self._lower) // self._stepdef __getitem__(self, point_or_range):if isinstance(point_or_range, tuple):low, high = point_or_ranger = self._map_range(point_or_range)# all points in the range must point to the same valuevalue = self._table[r[0]]if value is _EMPTY or any(self._table[i] != value for i in r):raise IndexError('Not a range for a single value')else:value = self._table[self._point_to_index(point_or_range)]if value is _EMPTY:raise IndexError('Point not in map')return valuedef __iter__(self):low = Nonevalue = _EMPTYfor i, v in enumerate(self._table):pos = (self._lower + (i * self._step)) / self._magif v is _EMPTY:if low is not None:yield (low, pos)low = Noneelif v != value:if low is not None:yield (low, pos)low = posvalue = vif low is not None:yield (low, self._upper / self._mag)

The above implements the full mapping interface, and accepts both points and ranges (as a tuple modelling a [start, stop) interval) when indexing or testing for containment (supporting ranges made it easier to reuse the default keys, values and items dictionary view implementations, which all work from the __iter__ implementation).

Demo:

>>> d = RangeMap({
...     (0.0, 0.1): "a",
...     (0.1, 0.3): "b",
...     (0.3, 0.55): "c",
...     (0.55, 0.7): "d",
...     (0.7, 1.0): "e",
... })
>>> print(*d.items(), sep='\n')
((0.0, 0.1), 'a')
((0.1, 0.3), 'b')
((0.3, 0.55), 'c')
((0.55, 0.7), 'd')
((0.7, 1.0), 'e')
>>> d[0.05]
'a'
>>> d[0.8]
'e'
>>> d[0.9]
'e'
>>> import random
>>> d[random.random()]
'c'
>>> d[random.random()]
'a'

If you can't limit the step size and boundaries so readily, then your next best option is to use some kind of binary search algorithm; you keep the ranges in sorted order and pick a point in the middle of the data structure; based on your search key being higher or lower than that mid point you continue the search in either half of the data structure until you find a match.

If your ranges cover the full interval from lowest to highest boundary, then you can use the bisect module for this; just store either the lower or upper boundaries of each range in one list, the corresponding values in another, and use bisection to map a position in the first list to a result in the second.

If your ranges have gaps, then you either need to keep a third list with the other boundary and first validate that the point falls in the range, or use an interval tree. For non-overlapping ranges a simple binary tree would do, but there are more specialised implementations that support overlapping ranges too. There is a intervaltree project on PyPI that supports full interval tree operations.

A bisect-based mapping that matches behaviour to the fixed-table implementation would look like:

from bisect import bisect_left
from collections.abc import MutableMappingclass RangeBisection(MutableMapping):"""Map ranges to valuesLookups are done in O(logN) time. There are no limits set on the upper orlower bounds of the ranges, but ranges must not overlap."""def __init__(self, map=None):self._upper = []self._lower = []self._values = []if map is not None:self.update(map)def __len__(self):return len(self._values)def __getitem__(self, point_or_range):if isinstance(point_or_range, tuple):low, high = point_or_rangei = bisect_left(self._upper, high)point = lowelse:point = point_or_rangei = bisect_left(self._upper, point)if i >= len(self._values) or self._lower[i] > point:raise IndexError(point_or_range)return self._values[i]def __setitem__(self, r, value):lower, upper = ri = bisect_left(self._upper, upper)if i < len(self._values) and self._lower[i] < upper:raise IndexError('No overlaps permitted')self._upper.insert(i, upper)self._lower.insert(i, lower)self._values.insert(i, value)def __delitem__(self, r):lower, upper = ri = bisect_left(self._upper, upper)if self._upper[i] != upper or self._lower[i] != lower:raise IndexError('Range not in map')del self._upper[i]del self._lower[i]del self._values[i]def __iter__(self):yield from zip(self._lower, self._upper)

Use range as a key value in a dictionary, most efficient way?

Related Q&A

How to replace all instances of a sub-sequence in a list in Python?

How to initialise a 2D array in Python?

numpy: broadcast multiplication over one common axis of two 2d arrays

Convert integer to binary in python and compare the bits

python, confused in decorate and closure

Python Regex - checking for a capital letter with a lowercase after

Read hierarchical (tree-like) XML into a pandas dataframe, preserving hierarchy

Reference counting using PyDict_SetItemString

How to use statsmodels.tsa.seasonal.seasonal_decompose with a pandas dataframe

UnidentifiedImageError: cannot identify image file