Convert integer to a random but deterministically repeatable choice

2024/9/28 19:19:08

How do I convert an unsigned integer (representing a user ID) to a random looking but actually a deterministically repeatable choice? The choice must be selected with equal probability (irrespective of the distribution of the the input integers). For example, if I have 3 choices, i.e. [0, 1, 2], the user ID 123 may always be randomly assigned choice 2, whereas the user ID 234 may always be assigned choice 1.

Cross-language and cross-platform algorithmic reproducibility is desirable. I'm inclined to use a hash function and modulo unless there is a better way. Here is what I have:

>>> num_choices = 3
>>> id_num = 123
>>> int(hashlib.sha256(str(id_num).encode()).hexdigest(), 16) % num_choices
2

I'm using the latest stable Python 3. Please note that this question is similar but not exactly identical to the related question to convert a string to random but deterministically repeatable uniform probability.

Answer

Using hash and modulo

import hashlibdef id_to_choice(id_num, num_choices):id_bytes = id_num.to_bytes((id_num.bit_length() + 7) // 8, 'big')id_hash = hashlib.sha512(id_bytes)id_hash_int = int.from_bytes(id_hash.digest(), 'big')  # Uses explicit byteorder for system-agnostic reproducibilitychoice = id_hash_int % num_choices  # Use with small num_choices onlyreturn choice>>> id_to_choice(123, 3)
0
>>> id_to_choice(456, 3)
1

Notes:

  • The built-in hash method must not be used because it can preserve the input's distribution, e.g. with hash(123). Alternatively, it can return values that differ when Python is restarted, e.g. with hash('123').

  • For converting an int to bytes, bytes(id_num) works but is grossly inefficient as it returns an array of null bytes, and so it must not be used. Using int.to_bytes is better. Using str(id_num).encode() works but wastes a few bytes.

  • Admittedly, using modulo doesn't offer exactly uniform probability,[1][2] but this shouldn't bias much for this application because id_hash_int is expected to be very large and num_choices is assumed to be small.

Using random

The random module can be used with id_num as its seed, while addressing concerns surrounding both thread safety and continuity. Using randrange in this manner is comparable to and simpler than hashing the seed and taking modulo.

With this approach, not only is cross-language reproducibility a concern, but reproducibility across multiple future versions of Python could also be a concern. It is therefore not recommended.

import randomdef id_to_choice(id_num, num_choices):localrandom = random.Random(id_num)choice = localrandom.randrange(num_choices)return choice>>> id_to_choice(123, 3)
0
>>> id_to_choice(456, 3)
2
https://en.xdnf.cn/q/71302.html

Related Q&A

Using python opencv to load image from zip

I am able to successfully load an image from a zip:with zipfile.ZipFile(test.zip, r) as zfile:data = zfile.read(test.jpg)# how to open this using imread or imdecode?The question is: how can I open thi…

DeprecationWarning: Function when moving app (removed titlebar) - PySide6

I get when I move the App this Warning: C:\Qt\Login_Test\main.py:48: DeprecationWarning: Function: globalPos() const is marked as deprecated, please check the documentation for more information.self.dr…

Architecture solution for Python Web application

Were setting up a Python REST web application. Right now, were using WSGI, but we might do some changes to that in the future (using Twisted, for example, to improve on scalability or some other featu…

Capture image for processing

Im using Python with PIL and SciPy. i want to capture an image from a webcam then process it further using numpy and Scipy. Can somebody please help me out with the code.Here is the code there is a pre…

Loading Magnet LINK using Rasterbar libtorrent in Python

How would one load a Magnet link via rasterbar libtorrent python binding?

Python currying with any number of variables

I am trying to use currying to make a simple functional add in Python. I found this curry decorator here.def curry(func): def curried(*args, **kwargs):if len(args) + len(kwargs) >= func.__code__…

Python - Display rows with repeated values in csv files

I have a .csv file with several columns, one of them filled with random numbers and I want to find duplicated values there. In case there are - strange case, but its what I want to check after all -, I…

Defining __getattr__ and __getitem__ on a function has no effect

Disclaimer This is just an exercise in meta-programming, it has no practical purpose.Ive assigned __getitem__ and __getattr__ methods on a function object, but there is no effect...def foo():print &quo…

thread._local object has no attribute

I was trying to change the logging format by adding a context filter. My Format is like thisFORMAT = "%(asctime)s %(VAL)s %(message)s"This is the class I use to set the VAL in the format. cla…

Pytorch batch matrix vector outer product

I am trying to generate a vector-matrix outer product (tensor) using PyTorch. Assuming the vector v has size p and the matrix M has size qXr, the result of the product should be pXqXr.Example:#size: 2 …