Use range as a key value in a dictionary, most efficient way?

2024/9/21 16:40:30

I have been wondering if there is some kind of data-structure or clever way to use a dictionary (O(1) lookup) to return a value if there are given values for defined ranges that do not overlap. So far I have been thinking this could be done if the ranges have some constant difference (0-2, 2-4, 4-6, etc.) or a binary-search could be done to do this in O(log(n)) time.

So, for example given a dictionary,

d = {[0.0 - 0.1): "a",[0.1 - 0.3): "b",[0.3 - 0.55): "c",[0.55 - 0.7): "d",[0.7 - 1.0): "e"}

it should return,

d[0.05] 
>>> "a"
d[0.8]
>>> "e"
d[0.9]
>>> "e"
d[random.random()] # this should also work

Is there anyway to achieve something like this? Thanks for any responses or answers on this.

Answer

You can have O(1) lookup time if you accept a low(er) resolution of range boundaries and sacrifice memory for lookup speed.

A dictionary can do a lookup in O(1) average time because there is a simple arithmetic relationship between key and location in a fixed-size data structure (hash(key) % tablesize, for the average case). Your ranges are effectively of a variable size with floating-point boundaries, so there is no fixed tablesize to map a search value to.

Unless, that is, you limit the absolute lower and upper boundaries of the ranges, and let range boundaries fall on a fixed step size. Your example uses values from 0.0 through to 1.0, and the ranges can be quantized to 0.05 steps. That can be turned into a fixed table:

import math
from collections import MutableMapping# empty slot marker
_EMPTY = object()class RangeMap(MutableMapping):"""Map points to values, and find values for points in O(1) constant timeThe map requires a fixed minimum lower and maximum upper bound forthe ranges. Range boundaries are quantized to a fixed step size. Gapsare permitted, when setting overlapping ranges last range set wins."""def __init__(self, map=None, lower=0.0, upper=1.0, step=0.05):self._mag = 10 ** -round(math.log10(step) - 1)  # shift to integersself._lower, self._upper = round(lower * self._mag), round(upper * self._mag)self._step = round(step * self._mag)self._steps = (self._upper - self._lower) // self._stepself._table = [_EMPTY] * self._stepsself._len = 0if map is not None:self.update(map)def __len__(self):return self._lendef _map_range(self, r):low, high = rstart = round(low * self._mag) // self._stepstop = round(high * self._mag) // self._stepif not self._lower <= start < stop <= self._upper:raise IndexError('Range outside of map boundaries')return range(start - self._lower, stop - self._lower)def __setitem__(self, r, value):for i in self._map_range(r):self._len += int(self._table[i] is _EMPTY)self._table[i] = valuedef __delitem__(self, r):for i in self._map_range(r):self._len -= int(self._table[i] is not _EMPTY)self._table[i] = _EMPTYdef _point_to_index(self, point):point = round(point * self._mag)if not self._lower <= point <= self._upper:raise IndexError('Point outside of map boundaries')return (point - self._lower) // self._stepdef __getitem__(self, point_or_range):if isinstance(point_or_range, tuple):low, high = point_or_ranger = self._map_range(point_or_range)# all points in the range must point to the same valuevalue = self._table[r[0]]if value is _EMPTY or any(self._table[i] != value for i in r):raise IndexError('Not a range for a single value')else:value = self._table[self._point_to_index(point_or_range)]if value is _EMPTY:raise IndexError('Point not in map')return valuedef __iter__(self):low = Nonevalue = _EMPTYfor i, v in enumerate(self._table):pos = (self._lower + (i * self._step)) / self._magif v is _EMPTY:if low is not None:yield (low, pos)low = Noneelif v != value:if low is not None:yield (low, pos)low = posvalue = vif low is not None:yield (low, self._upper / self._mag)

The above implements the full mapping interface, and accepts both points and ranges (as a tuple modelling a [start, stop) interval) when indexing or testing for containment (supporting ranges made it easier to reuse the default keys, values and items dictionary view implementations, which all work from the __iter__ implementation).

Demo:

>>> d = RangeMap({
...     (0.0, 0.1): "a",
...     (0.1, 0.3): "b",
...     (0.3, 0.55): "c",
...     (0.55, 0.7): "d",
...     (0.7, 1.0): "e",
... })
>>> print(*d.items(), sep='\n')
((0.0, 0.1), 'a')
((0.1, 0.3), 'b')
((0.3, 0.55), 'c')
((0.55, 0.7), 'd')
((0.7, 1.0), 'e')
>>> d[0.05]
'a'
>>> d[0.8]
'e'
>>> d[0.9]
'e'
>>> import random
>>> d[random.random()]
'c'
>>> d[random.random()]
'a'

If you can't limit the step size and boundaries so readily, then your next best option is to use some kind of binary search algorithm; you keep the ranges in sorted order and pick a point in the middle of the data structure; based on your search key being higher or lower than that mid point you continue the search in either half of the data structure until you find a match.

If your ranges cover the full interval from lowest to highest boundary, then you can use the bisect module for this; just store either the lower or upper boundaries of each range in one list, the corresponding values in another, and use bisection to map a position in the first list to a result in the second.

If your ranges have gaps, then you either need to keep a third list with the other boundary and first validate that the point falls in the range, or use an interval tree. For non-overlapping ranges a simple binary tree would do, but there are more specialised implementations that support overlapping ranges too. There is a intervaltree project on PyPI that supports full interval tree operations.

A bisect-based mapping that matches behaviour to the fixed-table implementation would look like:

from bisect import bisect_left
from collections.abc import MutableMappingclass RangeBisection(MutableMapping):"""Map ranges to valuesLookups are done in O(logN) time. There are no limits set on the upper orlower bounds of the ranges, but ranges must not overlap."""def __init__(self, map=None):self._upper = []self._lower = []self._values = []if map is not None:self.update(map)def __len__(self):return len(self._values)def __getitem__(self, point_or_range):if isinstance(point_or_range, tuple):low, high = point_or_rangei = bisect_left(self._upper, high)point = lowelse:point = point_or_rangei = bisect_left(self._upper, point)if i >= len(self._values) or self._lower[i] > point:raise IndexError(point_or_range)return self._values[i]def __setitem__(self, r, value):lower, upper = ri = bisect_left(self._upper, upper)if i < len(self._values) and self._lower[i] < upper:raise IndexError('No overlaps permitted')self._upper.insert(i, upper)self._lower.insert(i, lower)self._values.insert(i, value)def __delitem__(self, r):lower, upper = ri = bisect_left(self._upper, upper)if self._upper[i] != upper or self._lower[i] != lower:raise IndexError('Range not in map')del self._upper[i]del self._lower[i]del self._values[i]def __iter__(self):yield from zip(self._lower, self._upper)
https://en.xdnf.cn/q/72037.html

Related Q&A

How to replace all instances of a sub-sequence in a list in Python?

I currently use this code:""" Replace all occurrences of subsequence a with b in list l """ def replace_subsequence(l,a,b):for i in range(len(l)):if(l[i:i+len(a)] == a):l…

How to initialise a 2D array in Python?

Ive been given the pseudo-code:for i= 1 to 3for j = 1 to 3board [i] [j] = 0next jnext iHow would I create this in python?(The idea is to create a 3 by 3 array with all of the elements set to 0 using a…

numpy: broadcast multiplication over one common axis of two 2d arrays

Im looking for a way to element-wise multiply two 2d arrays of shape (a, b) and (b, c), respectively. Over the b axis, which the two arrays have in common.For instance, an example of what Id like to br…

Convert integer to binary in python and compare the bits

How to convert a int n into binary and test each bit of the resulting binary number?I have just got the following after a lot of googling:def check_bit_positions(n, p1, p2):print int(str(n),2)However …

python, confused in decorate and closure

I have some test code:def num(num):def deco(func):def wrap(*args, **kwargs):inputed_num = numreturn func(*args, **kwargs)return wrapreturn deco@num(5) def test(a):return a + inputed_numprint test(1)whe…

Python Regex - checking for a capital letter with a lowercase after

I am trying to check for a capital letter that has a lowercase letter coming directly after it. The trick is that there is going to be a bunch of garbage capital letters and number coming directly befo…

Read hierarchical (tree-like) XML into a pandas dataframe, preserving hierarchy

I have a XML document that contains a hierarchical, tree-like structure, see the example below.The document contains several <Message> tags (I only copied one of them for convenience).Each <Me…

Reference counting using PyDict_SetItemString

Im wondering how memory management/reference counting works when a new value is set into an existing field within a PyDict (within a C extension).For instance, assume a dictionary is created and popula…

How to use statsmodels.tsa.seasonal.seasonal_decompose with a pandas dataframe

from statsmodels.tsa.seasonal import seasonal_decomposedef seasonal_decomp(df, model="additive"):seasonal_df = Noneseasonal_df = seasonal_decompose(df, model=additive)return seasonal_dfseason…

UnidentifiedImageError: cannot identify image file

Hello I am training a model with TensorFlow and Keras, and the dataset was downloaded from https://www.microsoft.com/en-us/download/confirmation.aspx?id=54765 This is a zip folder that I split in the …