celery .delay hangs (recent, not an auth problem)

2024/9/20 9:41:40

I am running Celery 2.2.4/djCelery 2.2.4, using RabbitMQ 2.1.1 as a backend. I recently brought online two new celery servers -- I had been running 2 workers across two machines with a total of ~18 threads and on my new souped up boxes (36g RAM + dual hyper-threaded quad-core), I am running 10 workers with 8 threads each, for a total of 180 threads -- my tasks are all pretty small so this should be fine.

The nodes have been running fine for the last few days, but today I noticed that .delaay() is hanging. When I interrupt it, I see a traceback that points here:

File "/home/django/deployed/releases/20110608183345/virtual-env/lib/python2.5/site-packages/celery/task/base.py", line 324, in delayreturn self.apply_async(args, kwargs)
File "/home/django/deployed/releases/20110608183345/virtual-env/lib/python2.5/site-packages/celery/task/base.py", line 449, in apply_asyncpublish.close()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/kombu/compat.py", line 108, in closeself.backend.close()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/channel.py", line 194, in close(20, 41),    # Channel.close_ok
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/abstract_channel.py", line 89, in waitself.channel_id, allowed_methods)
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/connection.py", line 198, in _wait_methodself.method_reader.read_method()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/method_framing.py", line 212, in read_methodself._next_method()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/method_framing.py", line 127, in _next_methodframe_type, channel, payload = self.source.read_frame()
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/transport.py", line 109, in read_frameframe_type, channel, size = unpack('>BHI', self._read(7))
File "/home/django/deployed/virtual-env/lib/python2.5/site-packages/amqplib/client_0_8/transport.py", line 200, in _reads = self.sock.recv(65536)

I've checked the Rabbit logs, and I see it the process trying to connect as:

=INFO REPORT==== 12-Jun-2011::22:58:12 ===
accepted TCP connection on 0.0.0.0:5672 from x.x.x.x:48569

I have my Celery log level set to INFO, but I don't see anything particularly interesting in the Celery logs EXCEPT that 2 of the workers can't connect to the broker:

[2011-06-12 22:41:08,033: ERROR/MainProcess] Consumer: Connection to broker lost. Trying to re-establish connection...

All of the other nodes are able to connect without issue.

I know that there was a posting ( RabbitMQ / Celery with Django hangs on delay/ready/etc - No useful log info ) last year of a similar nature, but I'm pretty certain that this is different. Could it be that the sheer number of workers is creating some sort of a race condition in amqplib -- I found this thread which seems to indicate that amqplib is not thread-safe, not sure if this matters for Celery.

EDIT: I've tried celeryctl purge on both nodes -- on one it succeeds, but on the other it fails with the following AMQP error:

AMQPConnectionException(reply_code, reply_text, (class_id, method_id))amqplib.client_0_8.exceptions.AMQPConnectionException: (530, u"NOT_ALLOWED - cannot redeclare exchange 'XXXXX' in vhost 'XXXXX' with different type, durable or autodelete   value", (40, 10), 'Channel.exchange_declare')

On both nodes, inspect stats hangs with the "can't close connection" traceback above. I'm at a loss here.

EDIT2: I was able to delete the offending exchange using exchange.delete from camqadm and now the second node hangs too :(.

EDIT3: One thing that also recently changed is that I added an additional vhost to rabbitmq, which my staging node connects to.

Answer

Hopefully this will save somebody a lot of time...though it certainly does not save me any embarrassment:

/var was full on the server that was running rabbit. With all of the nodes that I added, rabbit was doing a lot more logging and it filled up /var -- I couldn't write to /var/lib/rabbitmq, and so no messages were going through.

https://en.xdnf.cn/q/72515.html

Related Q&A

Python ttk.combobox force post/open

I am trying to extend the ttk combobox class to allow autosuggestion. the code I have far works well, but I would like to get it to show the dropdown once some text has been entered without removing fo…

How to pull from the remote using dulwich?

How to do something like git pull in python dulwich library.

convert python programme to windows executable

i m trying to create windows executable from python program which has GUI . i m using following scriptfrom distutils.core import setup import py2exesetup(console=[gui.py]) it gives following errorWarni…

Must explicitly set engine if not passing in buffer or path for io in Panda

When running the following Python Panda code:xl = pd.ExcelFile(dataFileUrl)sheets = xl.sheet_namesdata = xl.parse(sheets[0])colheaders = list(data)I receive the ValueError:Must ex…

How to access the keys or values of Python GDB Value

I have a struct in GDB and want to run a script which examines this struct. In Python GDB you can easily access the struct via(gdb) python mystruct = gdb.parse_and_eval("mystruct")Now I got t…

How can I fire a Traits static event notification on a List?

I am working through the traits presentation from PyCon 2010. At about 2:30:45 the presenter starts covering trait event notifications, which allow (among other things) the ability to automatically ca…

Calculate moving average in numpy array with NaNs

I am trying to calculate the moving average in a large numpy array that contains NaNs. Currently I am using:import numpy as npdef moving_average(a,n=5):ret = np.cumsum(a,dtype=float)ret[n:] = ret[n:]-r…

Python: numpy.insert NaN value

Im trying to insert NaN values to specific indices of a numpy array. I keep getting this error:TypeError: Cannot cast array data from dtype(float64) to dtype(int64) according to the rule safeWhen tryin…

Identify external workbook links using openpyxl

I am trying to identify all cells that contain external workbook references, using openpyxl in Python 3.4. But I am failing. My first try consisted of:def find_external_value(cell): # identifies an e…

3D-Stacked 2D histograms

I have a bunch of 2D histograms (square 2D numpy arrays) that I want to stack in 3D like so:(Image from: Cardenas, Alfredo E., et al. "Unassisted transport of N-acetyl-L-tryptophanamide through me…