asyncio - how many coroutines?

2024/9/8 9:04:13

I have been struggling for a few days now with a python application where I am expecting to look for a file or files in a folder and iterate through the each file and each record in it and create objects to be persisted on a Janusgraph database. The particular OGM that I am using, requires that the transactions with the database are done in an asynchronously using asyncio. I have read a lot of blogs, posts about asyncio and I think I understand the concept of async, await, tasks, etc... In my application I have defined several functions that handle different parts of the processing:

  • Retrieves the list of all files available
  • Select one file for processing
  • Iterates through the selected file and reads a line/record for processing
  • Receives the record, determines parses the from in and calls several other functions that are responsible for creating the Model objects before they are persisted to the database. For instance, I different functions that creates: User, Session, Browser, DeviceUsed, Server, etc...

I understand (and I may be wrong) that the big advantage of using asyncio is for situations where the call to a function will block usually for I/O, database transaction, network latency, etc...

So my question is if I need to convert all my functions into coroutines and schedule to run through the event loop, or just the ones that would block, like committing transaction to the database. I tried this approach to begin with and had all sorts of problems.


So my question is if I need to convert all my functions into coroutines and schedule to run through the event loop, or just the ones that would block,

You might need to convert most of them, but the conversion should be largely mechanical, boiling down to changing def to async def, and adding await when calling other coroutines.

Obviously, you cannot avoid converting the ones that actually block, either by switching to the appropriate asyncio API or by using loop.run_in_executor() for those that don't have one. (DNS resolution used to be an outstanding example of the latter.)

But then you also need to convert their callers, because calling a coroutine from a blocking function is not useful unless the function implements event-loop-like functionality. On the other hand, when a coroutine is called from another coroutine, everything works because suspends are automatically propagated to the top of the chain. Once the whole call chain consists of coroutines, the top-level ones are fed to the event loop using loop.create_task() or loop.run_until_complete().

Of course, convenience functions that neither block nor call blocking functions can safely remain non-async, and are invoked by either sync or async code without any difference.

The above applies to asyncio, which implements stackless coroutines. A different approach is used by greenlet, whose tasks encapsulate the call stack, which allows them to be switched at arbitrary places in code that uses normal function calls. Greenlets are a bit more heavyweight and less portable than coroutines, though, so I'd first converting to asyncio.

Related Q&A

Calculating a 3D gradient with unevenly spaced points

I currently have a volume spanned by a few million every unevenly spaced particles and each particle has an attribute (potential, for those who are curious) that I want to calculate the local force (ac…

deleting every nth element from a list in python 2.7

I have been given a task to create a code for. The task is as follows:You are the captain of a sailing vessel and you and your crew havebeen captured by pirates. The pirate captain has all of you stand…

Bradley-Roth Adaptive Thresholding Algorithm - How do I get better performance?

I have the following code for image thresholding, using the Bradley-Roth image thresholding method. from PIL import Image import copy import time def bradley_threshold(image, threshold=75, windowsize=5…

How to display all images in a directory with flask [duplicate]

This question already has answers here:Reference template variable within Jinja expression(1 answer)Link to Flask static files with url_for(2 answers)Closed 6 years ago.I am trying to display all image…

Reindex sublevel of pandas dataframe multiindex

I have a time series dataframe and I would like to reindex it by Trials and Measurements.Simplified, I have this:value Trial 1 0 131 32 42 3 NaN4 123…

How to publish to an Azure Devops PyPI feed with Poetry?

I am trying to set up Azure Devops to publish to a PyPI feed with Poetry. I know about Twine authentication and storing credentials to an Azure Key Vault. But is there any more straightforward method?…

Python Regex Match Before Character AND Ignore White Space

Im trying to write a regex to match part of a string that comes before / but also ignores any leading or trailing white space within the match.So far Ive got ^[^\/]* which matches everything before the…

Python Twisted integration with Cmd module

I like Pythons Twisted and Cmd. I want to use them together.I got some things working, but so far I havent figured out how to make tab-completion work, because I dont see how to receive tab keypres ev…

Read .pptx file from s3

I try to open a .pptx from Amazon S3 and read it using the python-pptx library. This is the code: from pptx import Presentation import boto3 s3 = boto3.resource(s3)obj=s3.Object(bucket,key) body = obj.…

PIL image display error It looks like the image was moved or renamed

Here is a bit of my code:from PIL import Imageimage = error is when the default photo viewer is started and shows the error "It looks like …