How do I efficiently do a bulk insert-or-update with SQLAlchemy?

2024/10/15 21:14:45

I'm using SQLAlchemy with a Postgres backend to do a bulk insert-or-update. To try to improve performance, I'm attempting to commit only once every thousand rows or so:

trans = engine.begin()for i, rec in enumerate(records):if i % 1000 == 0:trans.commit()trans = engine.begin()try:inserter.execute(...)except sa.exceptions.SQLError:my_table.update(...).execute()
trans.commit()

However, this isn't working. It seems that when the INSERT fails, it leaves things in a weird state that prevents the UPDATE from happening. Is it automatically rolling back the transaction? If so, can this be stopped? I don't want my entire transaction rolled back in the event of a problem, which is why I'm trying to catch the exception in the first place.

The error message I'm getting, BTW, is "sqlalchemy.exc.InternalError: (InternalError) current transaction is aborted, commands ignored until end of transaction block", and it happens on the update().execute() call.

Answer

You're hitting some weird Postgresql-specific behavior: if an error happens in a transaction, it forces the whole transaction to be rolled back. I consider this a Postgres design bug; it takes quite a bit of SQL contortionism to work around in some cases.

One workaround is to do the UPDATE first. Detect if it actually modified a row by looking at cursor.rowcount; if it didn't modify any rows, it didn't exist, so do the INSERT. (This will be faster if you update more frequently than you insert, of course.)

Another workaround is to use savepoints:

SAVEPOINT a;
INSERT INTO ....;
-- on error:
ROLLBACK TO SAVEPOINT a;
UPDATE ...;
-- on success:
RELEASE SAVEPOINT a;

This has a serious problem for production-quality code: you have to detect the error accurately. Presumably you're expecting to hit a unique constraint check, but you may hit something unexpected, and it may be next to impossible to reliably distinguish the expected error from the unexpected one. If this hits the error condition incorrectly, it'll lead to obscure problems where nothing will be updated or inserted and no error will be seen. Be very careful with this. You can narrow down the error case by looking at Postgresql's error code to make sure it's the error type you're expecting, but the potential problem is still there.

Finally, if you really want to do batch-insert-or-update, you actually want to do many of them in a few commands, not one item per command. This requires trickier SQL: SELECT nested inside an INSERT, filtering out the right items to insert and update.

https://en.xdnf.cn/q/69237.html

Related Q&A

How to pass variables from javascript to python in Jupyter?

As I understand it, I should be able to print the variable foo in the snippet below. from IPython.display import HTML HTML(<script type="text/javascript">IPython.notebook.kernel.execute…

SVR Model --Feature Scaling - Expected 2D array, got 1D array instead

I am trying to understand what is wrong with the code below. I know that the Y variable is 1D array and expected to be 2D array and need to reshape the structure but that code was working previously fi…

How to find the version of jupyter notebook from within the notebook

I wish to return the version of Jupyter Notebook from within a cell of a notebook. For example, to get the python version, I run: from platform import python_version python_version()or to get the panda…

Python logging - multiple modules

Im working on a small python project that has the following structure -project -- logs-- project__init.py__classA.pyclassB.pyutils.py-- main.pyIve set up the logging configuration in __init.py__ under …

Can you search backwards from an offset using a Python regular expression?

Given a string, and a character offset within that string, can I search backwards using a Python regular expression?The actual problem Im trying to solve is to get a matching phrase at a particular of…

Django AttributeError: Form object has no attribute _errors

Im overriding the init method in my form andthis is now returning an error TransactionForm object has no attribute _errors. I would expect this to work because Ive included super in my init, however pe…

Add new keys to a dictionary while incrementing existing values

I am processing a CSV file and counting the unique values of column 4. So far I have coded this three ways. One uses "if key in dictionary", the second traps the KeyError and the third uses &…

ImportError: cannot import name aiplatform from google.cloud (unknown location)

I was wondering where that error comes from. The package has to be installed additionally to google.cloud

What does : TypeError: cannot concatenate str and list objects mean?

What does this error mean?TypeError: cannot concatenate str and list objectsHeres part of the code:for j in (90.,52.62263.,26.5651.,10.8123.):if j == 90.:z = (0.)elif j == 52.62263.:z = (0., 72., 144.…

How do I create a fixed-length, mutable array of Python objects in Cython?

I need to have an array of python objects to be used in creating a trie datastructure. I need a structure that will be fixed-length like a tuple and mutable like a list. I dont want to use a list bec…