Python WeakKeyDictionary for unhashable types

2024/9/29 15:18:34

As raised in cpython issue 88306, python WeakKeyDictionary fails for non hashable types. According to the discussion in the python issue above, this is an unnecessary restriction, using ids of the keys instead of hash would work just fine: In this special case ids are unique identifiers for the keys in the WeakKeyDictionary, because the keys are automatically removed when the original object is deleted. It is important to be aware that using ids instead of hashes is only feasible in this very special case.

We can tweak weakref.WeakKeyDictionary (see gist) to achieve the desired behaviour. In summary, this implementation wraps the weakref keys as follows:

class _IdKey:def __init__(self, key):self._id = id(key)def __hash__(self):return self._iddef __eq__(self, other: typing_extensions.Self):return self._id == other._iddef __repr__(self):return f"<_IdKey(_id={self._id})>"class _IdWeakRef(_IdKey):def __init__(self, key, remove: typing.Callable[[typing.Any], None]):super().__init__(key)# hold weak ref to avoid garbage collection of the remove callbackself._ref = weakref.ref(key, lambda _: remove(self))def __call__(self):# used in weakref.WeakKeyDictionary.__copy__return self._ref()def __repr__(self):return f"<_IdKey(_id={self._id},{self._ref})>"class WeakKeyIdDictionary(weakref.WeakKeyDictionary):"""overrides all methods involving dictionary access key """... https://gist.github.com/barmettl/b198f0cf6c22047df77483e8aa28f408

However, this depends on the details of the implementation of weakref.WeakKeyDictionary (using python3.10 here) and is likely to break in future (or even past) versions of python. Of course, alternatively one can just rewrite an entirely new class.

It is also possible to implement a custom __hash__ method for all classes, but this won't work when dealing with external code and will give unreliable hashes for use cases beyond weakref.WeakKeyDictionary. We can also monkey patch __hash__, but this is not possible in particular for built in classes and will have unintended effects in other parts of the code.

Thus the following question: How should one store non hashable items in a WeakKeyDictionary?

Answer

There is a way which does not rely on knowing the internals of WeakKeyDictionary:

from weakref import WeakKeyDictionary, WeakValueDictionaryclass Id:def __init__(self, key):self._id = id(key)def __hash__(self):return self._iddef __eq__(self, other):return self._id == other._idclass WeakUnhashableKeyDictionary:def __init__(self, *args, **kwargs):# TODO Do something to initialize given args and kwargs.self.keys = WeakValueDictionary()self.values = WeakKeyDictionary()def __getitem__(self, key):return self.values.__getitem__(Id(key))def __setitem__(self, key, value):_id = Id(key)# NOTE This works because key holds on _id iif key exists,# and _id holds on value iif _id exists. Transitivity. QED.# Because key is only stored as a value, it does not need to be hashable.self.keys.__setitem__(_id, key)self.values.__setitem__(_id, value)def __delitem__(self, key):self.keys.__delitem__(Id(key))self.values.__delitem__(Id(key))# etc. other methods should be relatively simple to implement.# TODO Might require some locks or care in the ordering of operations to work threaded.# TODO Add clean error handling.

This is just a generalization of my answer to a method caching problem.

https://en.xdnf.cn/q/71195.html

Related Q&A

Django MPTT efficiently serializing relational data with DRF

I have a Category model that is a MPTT model. It is m2m to Group and I need to serialize the tree with related counts, imagine my Category tree is this:Root (related to 1 group)- Branch (related to 2 g…

Psycopg2: module object has no attribute connect [duplicate]

This question already has answers here:Importing a library from (or near) a script with the same name raises "AttributeError: module has no attribute" or an ImportError or NameError(4 answers…

matplotlib text not clipped

When drawing text in matplotlib with text(), and then interactively panning the image, the resulting drawn text is not clipped to the data window. This is counter to how plotting data or drawing text …

how to make child class call parent class __init__ automatically?

i had a class called CacheObject,and many class extend from it.now i need to add something common on all classes from this class so i write thisclass CacheObject(object):def __init__(self):self.updated…

Creating a dataframe in pandas by multiplying two series together

Say I have two series in pandas, series A and series B. How do I create a dataframe in which all of those values are multiplied together, i.e. with series A down the left hand side and series B along t…

UnicodeDecodeError in PyCharm debugger

Its a reference to UnicodeDecodeError while using cyryllic .I have same problem with Python 3.3 and Pycharm 2.7.2 Tryed to hardcode encoding in code, manually specifying encoding in Pycharm options, bu…

Scipy griddata with linear and cubic yields nan

the following code should produce griddata. But in case I choose as interpolation type cubic or linear I am getting nans in the z grid. Wen im choosing nearest everything is running fine. Here is an ex…

Clone a module and make changes to the copy

Is it possible to copy a module, and then make changes to the copy? To phrase another way, can I inherit from a module, and then override or modify parts of it?

AWS Lambda, Python, Numpy and others as Layers

I have been going at this for a while trying to get python, numpy and pytz added to AWS Lambda as Layers rather than having to zip and throw it at AWS with my .py file. I was able to follow multiple tu…

Is there a way to check if a module is being loaded by multiprocessing standard module in Windows?

I believe on Windows, because there is no fork, the multiprocessing module reloads modules in new Pythons processes. You are required to have this code in your main script, otherwise very nasty crashes…