Python Socket Receive/Send Multi-threading

2024/10/2 14:31:17

I am writing a Python program where in the main thread I am continuously (in a loop) receiving data through a TCP socket, using the recv function. In a callback function, I am sending data through the same socket, using the sendall function. What triggers the callback is irrelevant. I've set my socket to blocking.

My question is, is this safe to do? My understanding is that a callback function is called on a separate thread (not the main thread). Is the Python socket object thread-safe? From my research, I've been getting conflicting answers.

Answer

Sockets in Python are not thread safe.

You're trying to solve a few problems at once:

  1. Sockets are not thread-safe.
  2. recv is blocking and blocks the main thread.
  3. sendall is being used from a different thread.

You may solve these by either using asyncio or solving it the way asyncio solves it internally: By using select.select together with a socketpair, and using a queue for the incoming data.

import select
import socket
import queue# Any data received by this queue will be sent
send_queue = queue.Queue()# Any data sent to ssock shows up on rsock
rsock, ssock = socket.socketpair()main_socket = socket.socket()# Create the connection with main_socket, fill this up with your code# Your callback thread
def different_thread():# Put the data to send inside the queuesend_queue.put(data)# Trigger the main thread by sending data to ssock which goes to rsockssock.send(b"\x00")# Run the callback threadwhile True:# When either main_socket has data or rsock has data, select.select will returnrlist, _, _ = select.select([main_socket, rsock], [], [])for ready_socket in rlist:if ready_socket is main_socket:data = main_socket.recv(1024)# Do stuff with data, fill this up with your codeelse:# Ready_socket is rsockrsock.recv(1)  # Dump the ready mark# Send the data.main_socket.sendall(send_queue.get())

We use multiple constructs in here. You will have to fill up the empty spaces with your code of choice. As for the explanation:

We first create a send_queue which is a queue of data to send. Then, we create a pair of connected sockets (socketpair()). We need this later on in order to wake up the main thread as we don't wish recv() to block and prevent writing to the socket.

Then, we connect the main_socket and start the callback thread. Now here's the magic:

In the main thread, we use select.select to know if the rsock or main_socket has any data. If one of them has data, the main thread wakes up.

Upon adding data to the queue, we wake up the main thread by signaling ssock which wakes up rsock and thus returns from select.select.

In order to fully understand this, you'll have to read select.select(), socketpair() and queue.Queue().


@tobias.mcnulty asked a good question in the comments: Why should we use a Queue instead of sending all the data through the socket?

You can use the socketpair to send the data as well, which has its benefits, but sending over a queue might be preferable for multiple reasons:

  1. Sending data over a socket is an expensive operation. It requires a syscall, requires passing data back and forth inside system buffers, and entails full use of the TCP stack. Using a Queue guarantees we'll have only 1 call - for the single-byte signal - and not more (apart from the queue's internal lock, but that one is pretty cheap). Sending large data through the socketpair will result in multiple syscalls. As a tip, you may as well use a collections.deque which CPython guarantees to be thread-safe because of the GIL. That way you won't have to require any syscall besides the socketpair.
  2. Architecture-wise, using a queue allows you to have finer-grained control later on. For example, the data can be sent in whichever type you wish and be decoded afterwards. This allows the main loop to be a little smarter and can help you create an easier interface.
  3. You don't have size limits. It can be a bug or a feature. I believe changing the system's buffer size is not exactly encouraged, which creates a natural throttle to the amount of data you can send. It might be a benefit, but the application may wish to control it on its own. Using the "natural" feature will cause the calling thread to hang.
  4. Just like socketpair.recv syscalls, for large data you will pass through multiple select calls as well. TCP does not have message boundaries. You'll either have to create artificial ones, set the socket to nonblocking and deal with asynchronous sockets, or think of it as a stream and continuously pass through select calls which might be expensive depending on your OS.
  5. Support for multiple threads on the same socketpair. Sending 1 byte for signalling over a socket from multiple threads is fine, and is exactly how asyncio works. Sending more than that may cause the data to be sent in an incorrect order.

All in all, transferring the data back and forth between the kernel and userspace is possible and will work, but I personally do not recommend it.

https://en.xdnf.cn/q/70847.html

Related Q&A

numpy array2string applied on huge array, skips central values, ( ... in the middle )

I have array of size (3, 3, 19, 19), which I applied flatten to get array of size 3249.I had to write these values to file along with some other data, so I did following to get the array in string.np.a…

save password as salted hash in mongodb in users collection using python/bcrypt

I want to generate a salted password hash and store it in MongoDB collection called users, like this:users_doc = { "username": "James","password": "<salted_hash_pa…

Get the min of [0, x] element wise for a column

I need to compute a column where the value is the result of a vectorized operation over other columns: df["new_col"] = df["col1"] - min(0,df["col2"])It turned out, however…

Virtual column in QTableView?

Im started to learning Qt4 Model/View Programming and I have beginner question.I have simple application which show sqlite table in QTableView:class Model(QtSql.QSqlTableModel):def __init__(self, paren…

Python httplib2, AttributeError: set object has no attribute items

Im playing with the Python library httplib2. The following is my code. import urllib.parse import httplib2httplib2.debuglevel = 1http = httplib2.Http()url = "http://login.sina.com.cn/hd/signin.php…

Atom IDE autocomplete-python not working

I have just installed the Atom IDE and the package autocomplete-python (on Windows). But the package is not working. Do I have to make any setting changes? (I have disabled autocomplete-plus and autoc…

Multiple instances of a class being overwritten at the same time? (Python)

Heres a very simple code I made to demonstrate the problem Im encountering. Whats happening here is that Im creating two different instances of the same class but changing an attribute of one will cha…

how to store binary file recieved by Flask into postgres

I currently have a Flask route that reveives file content via POST, and that stores it on the file system, ex: @app.route(/upload, methods=[POST]) def upload_file():def allowed_file(f):return Truefile …

How can I kill a single shot QtCore.QTimer in PyQt4?

So, in my application, I create a single QtCore.QTimer object and then call the singleShot method on it to evoke a function after say 60 secs. Now, at any given point in time, if I would need to call t…

How to convert list of lists to a set in python so I can compare to other sets?

I have a list users_with_invites_ids_list, formed by loop where I append values to the list, in python that looks like this:...[ObjectId(55119e14bf2e4e010d8b48f2)], [ObjectId(54624128bf2e4e5e558b5a52)]…