Better solution for Python Threading.Event semi-busy waiting

2024/10/15 5:17:18

I'm using pretty standard Threading.Event: Main thread gets to a point where its in a loop that runs:

event.wait(60)

The other blocks on a request until a reply is available and then initiates a:

event.set()

I would expect the main thread to select for 40 seconds, but this is not the case. From the Python 2.7 source Lib/threading.py:

# Balancing act:  We can't afford a pure busy loop, so we
# have to sleep; but if we sleep the whole timeout time,
# we'll be unresponsive.  The scheme here sleeps very
# little at first, longer as time goes on, but never longer
# than 20 times per second (or the timeout time remaining).
endtime = _time() + timeout
delay = 0.0005 # 500 us -> initial delay of 1 ms
while True:gotit = waiter.acquire(0)if gotit:breakremaining = endtime - _time()if remaining <= 0:breakdelay = min(delay * 2, remaining, .05)_sleep(delay)

What we get is a select syscall run every 500us. This causes noticeable load on the machine with a pretty tight select loop.

Can someone please explain why there is a balancing act involved and why is it different than a thread waiting on a file descriptor.

and second, Is there a better way to implement a mostly sleeping main thread without such a tight loop?

Answer

I recently got hit by the same problem, and I also tracked it down to this exact block of code in the threading module.

It sucks.

The solution would be to either overload the threading module, or migrate to python3, where this part of the implementation has been fixed.

In my case, migrating to python3 would have been a huge effort, so I chose the former. What I did was:

  1. I created a quick .so file (using cython) with an interface to pthread. It includes python functions which invoke the corresponding pthread_mutex_* functions, and links against libpthread. Specifically, the function most relevant to the task we're interested in is pthread_mutex_timedlock.
  2. I created a new threading2 module, (and replaced all import threading lines in my codebase with import threading2). In threading2, I re-defined all the relevant classes from threading (Lock, Condition, Event), and also ones from Queue which I use a lot (Queue and PriorityQueue). The Lock class was completely re-implemented using pthread_mutex_* functions, but the rest were much easier -- I simply subclassed the original (e.g. threading.Event), and overridden __init__ to create my new Lock type. The rest just worked.

The implementation of the new Lock type was very similar to the original implementation in threading, but I based the new implemenation of acquire on the code I found in python3's threading module (which, naturally, is much simpler than the abovementioned "balancing act" block). This part was fairly easy.

(Btw, the result in my case was 30% speedup of my massively-multithreaded process. Even more than I expected.)

https://en.xdnf.cn/q/69320.html

Related Q&A

\ufeff Invalid character in identifier

I have the following code :import urllib.requesttry:url = "https://www.google.com/search?q=test"headers = {}usag = Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:25.0) Gecko/20100101 Firefo…

Python multiprocessing - Passing a list of dicts to a pool

This question may be a duplicate. However, I read lot of stuff around on this topic, and I didnt find one that matches my case - or at least, I didnt understood it.Sorry for the inconvenance.What Im tr…

Failed to write to file but generates no Error

Im trying to write to a file but its not working. Ive gone through step-by-step with the debugger (it goes to the write command but when I open the file its empty).My question is either: "How do I…

train spacy for text classification

After reading the docs and doing the tutorial I figured Id make a small demo. Turns out my model does not want to train. Heres the codeimport spacy import random import jsonTRAINING_DATA = [["My l…

Python threading vs. multiprocessing in Linux

Based on this question I assumed that creating new process should be almost as fast as creating new thread in Linux. However, little test showed very different result. Heres my code: from multiprocessi…

How to create a visualization for events along a timeline?

Im building a visualization with Python. There Id like to visualize fuel stops and the fuel costs of my car. Furthermore, car washes and their costs should be visualized as well as repairs. The fuel c…

Multiplying Numpy 3D arrays by 1D arrays

I am trying to multiply a 3D array by a 1D array, such that each 2D array along the 3rd (depth: d) dimension is calculated like:1D_array[d]*2D_arrayAnd I end up with an array that looks like, say:[[ [1…

Django Performing System Checks is running very slow

Out of nowhere Im running into an issue with my Django application where it runs the "Performing System Checks" command very slow. If I start the server with python manage.py runserverIt take…

str.translate vs str.replace - When to use which one?

When and why to use the former instead of the latter and vice versa?It is not entirely clear why some use the former and why some use the latter.

python BeautifulSoup searching a tag

My first post here, Im trying to find all tags in this specific html and i cant get them out, this is the code:from bs4 import BeautifulSoup from urllib import urlopenurl = "http://www.jutarnji.h…