Question 1

Is there any way to embed python, allow callbacks from python to C++, allowing the Pythhon code to spawn threads, and avoiding deadlocks?

The problem is this:

To call into Python, I need to hold the GIL. Typically, I do this by getting the main thread state when I first create the interpreter, and then using PyEval_RestoreThread() to take the GIL and swap in the thread state before I call into Python.
When called from Python, I may need to access some protected resources that are protected by a separate critical section in my host. This means that Python will hold the GIL (potentially from some other thread than I initially called into), and then attempt to acquire my protection lock.
When calling into Python, I may need to hold the same locks, because I may be iterating over some collection of objects, for example.

The problem is that even if I hold the GIL when I call into Python, Python may give it up, give it to another thread, and then have that thread call into my host, expecting to take the host locks. Meanwhile, the host may take the host locks, and the GIL lock, and call into Python. Deadlock ensues.

The problem here is that Python relinquishes the GIL to another thread while I've called into it. That's what it's expected to do, but it makes it impossible to sequence locking -- even if I first take GIL, then take my own lock, then call Python, Python will call into my system from another thread, expecting to take my own lock (because it un-sequenced the GIL by releasing it).

I can't really make the rest of my system use the GIL for all possible locks in the system -- and that wouldn't even work right, because Python may still release it to another thread.

I can't really guarantee that my host doesn't hold any locks when entering Python, either, because I'm not in control of all the code in the host.

So, is it just the case that this can't be done?

Question 2

"When calling into Python, I may need to hold the same locks, because I may be iterating over some collection of objects, for example."

This often indicates that a single process with multiple threads isn't appropriate. Perhaps this is a situation where multiple processes -- each with a specific object from the collection -- makes more sense.

Independent process -- each with their own pool of threads -- may be easier to manage.

Python embedding with threads -- avoiding deadlocks?

Related Q&A

RuntimeError: Event loop is closed when using pytest-asyncio to test FastAPI routes

Adjust threshold cros_val_score sklearn

Efficiently insert multiple elements in a list (or another data structure) keeping their order

matplotlib versions =3 does not include a find()

How to fix Field defines a relation with the model auth.User, which has been swapped out

Generate and parse Python code from C# application

An efficient way to calculate the mean of each column or row of non-zero elements

Selecting unique observations in a pandas data frame

GEdit/Python execution plugin?

autoclass and instance attributes