Question 1

We're setting up a Python REST web application. Right now, we're using WSGI, but we might do some changes to that in the future (using Twisted, for example, to improve on scalability or some other feature). I would really like some help regarding what is considered a good architecture for a Web application in Python.

In general our app serves dynamic content, processes a moderate to high level of data from clients, performs pretty high-demand database, network and filesystem calls and should be "easily" scalable (quotes here because if a solution is great but somewhat tough to configure for scalability, it would definitely be thought of as good). We would probably like to evolve this into a highly parallel application in the mid-to-long term. Google App Engine is not an accepted suggestion, mainly because of its cost.

My question is this:

Is using WSGI a good idea? Should we be looking into something like Twisted instead?
Should we use Apache as a reverse proxy for our static files?
Is there some different pattern or architecture that we should consider that I haven't mentioned? (Even if completely obvious).

Any help at all with this would be very appreciated. Thanks a lot!

Question 2

A WSGI application will be fine this is mostly a backend question and data processing question, in my opinion as that is where more architectural parts come into play. I would look into using Celery ( http://celeryproject.org/ ) for your work distribution and backend scaling. Twisted would be a good choice, but it appears you already have that portion written for use as a WSGI application so I would just extend it with Celery.

I do not know the scope of your project but I would design it with Celery in mind.

I would have my frontend endpoints be the WSGI (because you already have that written) and write the backend to be distributed via messages. Then you would have a pool of backend nodes that would pull messages off of the Celery queue and complete the required work. It would look sort of like:

Apache -> WSGI Containers -> Celery Message Queue -> Celery Workers.

The apache nodes would be behind a load balancer of some kind. This would be a fairly simple architecture to scale and is, if done correctly, fairly reliable. Code for failure in a system like this and you will be fine.

Architecture solution for Python Web application

Related Q&A

Capture image for processing

Loading Magnet LINK using Rasterbar libtorrent in Python

Python currying with any number of variables

Python - Display rows with repeated values in csv files

Defining getattr and getitem on a function has no effect

thread._local object has no attribute

Pytorch batch matrix vector outer product

Scraping Google Analytics by Scrapy

Pandas representative sampling across multiple columns

TensorFlow - Ignore infinite values when calculating the mean of a tensor