Efficient upsert of pandas dataframe to MS SQL Server using pyodbc

2024/9/30 3:33:33

I'm trying to upsert a pandas dataframe to a MS SQL Server using pyodbc. I've used a similar approach before to do straight inserts, but the solution I've tried this time is incredibly slow. Is there a more streamlined way to accomplish an upsert than what I have?

sql_connect = pyodbc.connect('Driver={SQL Server Native Client 11.0}; Server=blank1; Database=blank2; UID=blank3; PWD=blank4')
cursor = sql_connect.cursor()for index, row in bdf.iterrows():res = cursor.execute("UPDATE dbo.MPA_BOOK_RAW SET [SITE]=?, [SHIP_TO]=?, [PROD_LINE]=?, [GROUP_NUMBER]=?, [DESCRIPTION]=?, [ORDER_QTY]=?, [BPS_INCLUDE]=? WHERE [CUST]=? AND [ORDER_NUMBER]=? AND [ORDER_DATE]=? AND [PURCHASE_ORDER]=? AND [CHANNEL]=? AND [ITEM]=? AND [END_DT]=?", row['SITE'], row['SHIP_TO'],row['PROD_LINE'],row['GROUP_NUMBER'],row['DESCRIPTION'],row['ORDER_QTY'],row['BPS_INCLUDE'],row['CUST'],row['ORDER_NUMBER'], row['ORDER_DATE'],row['PURCHASE_ORDER'], row['CHANNEL'],row['ITEM'],row['END_DT'])if res.rowcount == 0:cursor.execute("INSERT INTO dbo.MPA_BOOK_RAW ([SITE], [CUST], [ORDER_NUMBER], [ORDER_DATE], [PURCHASE_ORDER], [CHANNEL], [SHIP_TO], [PROD_LINE], [GROUP_NUMBER], [DESCRIPTION], [ITEM], [ORDER_QTY], [END_DT], [BPS_INCLUDE]) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)", row['SITE'], row['CUST'],row['ORDER_NUMBER'], row['ORDER_DATE'],row['PURCHASE_ORDER'], row['CHANNEL'],row['SHIP_TO'],row['PROD_LINE'],row['GROUP_NUMBER'],row['DESCRIPTION'],row['ITEM'],row['ORDER_QTY'],row['END_DT'],row['BPS_INCLUDE'])sql_connect.commit()cursor.close()
sql_connect.close()

I tried the above with a five row sample of my original ~50k row dataframe and it worked fine. So the logic seems okay. It's just the speed that is an issue.

Answer

Update, July 2022: You can save some typing by using this function to build the MERGE statement and perform the upsert for you.


Here is an example of an "upsert" using MERGE:

from pprint import pprintimport pandas as pd
import sqlalchemy as saconnection_string = ("Driver=ODBC Driver 17 for SQL Server;""Server=192.168.0.199;""UID=scott;PWD=tiger^5HHH;""DATABASE=test;""UseFMTONLY=Yes;"
)
sqlalchemy_url = sa.engine.URL.create("mssql+pyodbc", query={"odbc_connect": connection_string}
)
engine = sa.create_engine(sqlalchemy_url, fast_executemany=True)with engine.begin() as conn:# set up test environmentconn.exec_driver_sql("DROP TABLE IF EXISTS actual_table;")conn.exec_driver_sql("""\CREATE TABLE actual_table (institution_no VARCHAR(3), transit_no VARCHAR(5), branch_name VARCHAR(50),CONSTRAINT PK_actual_table PRIMARY KEY CLUSTERED (institution_no, transit_no));""")# actual_table initial stateconn.exec_driver_sql("""\INSERT INTO actual_table (institution_no, transit_no, branch_name) VALUES ('002', '45678', 'Scotiabank branch #45678 - *** UPDATE NEEDED ***'),('003', '67890', 'RBC branch #67890 - Sudbury, ON');""")# test data to be updated or inserteddf_update = pd.DataFrame([("004", "12345", "TD branch #12345 - London, ON"),("002", "45678", "Scotiabank branch #45678 - Timmins, ON"),("004", "34567", "TD branch #34567 - Toronto, ON"),],columns=["institution_no", "transit_no", "branch_name"],)# Here's where the real work begins ...## Step 1: upload update datadf_update.to_sql("#update_table", conn, index=False)## Step 2: perform the "upsert"sql = """\MERGE actual_table WITH (HOLDLOCK) AS aUSING (SELECT institution_no, transit_no, branch_name FROM #update_table) as uON (a.institution_no = u.institution_no AND a.transit_no = u.transit_no)WHEN MATCHED THENUPDATE SET branch_name = u.branch_nameWHEN NOT MATCHED THENINSERT (institution_no, transit_no, branch_name)VALUES (u.institution_no, u.transit_no, u.branch_name);"""result = conn.exec_driver_sql(sql)# verify results
with engine.begin() as conn:pprint(conn.exec_driver_sql("SELECT * FROM actual_table").fetchall())"""console output:[('002', '45678', 'Scotiabank branch #45678 - Timmins, ON'),('003', '67890', 'RBC branch #67890 - Sudbury, ON'),('004', '12345', 'TD branch #12345 - London, ON'),('004', '34567', 'TD branch #34567 - Toronto, ON')]"""
https://en.xdnf.cn/q/71136.html

Related Q&A

Comparison on the basis of min function

How exactly does the min function work for lists in python ?For example,num = [1,2,3,4,[1,2,3]]num2 = [1,2,3,4,5]min(num,num2) gives num2 as the result. Is the comparison value based or length based ?

Python Pandas rolling aggregate a column of lists

I have a simple dataframe df with a column of lists lists. I would like to generate an additional column based on lists.The df looks like:import pandas as pd lists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:…

Easy way of overriding default methods in custom Python classes?

I have a class called Cell:class Cell:def __init__(self, value, color, size):self._value = valueself._color = colorself._size = size# and other methods...Cell._value will store a string, integer, etc. …

Return first non NaN value in python list

What would be the best way to return the first non nan value from this list?testList = [nan, nan, 5.5, 5.0, 5.0, 5.5, 6.0, 6.5]edit:nan is a float

How to subplot pie chart in plotly?

How can I subplot pie1 in fig, so it be located at the first position. this is how I am doing it but it doesnt work out import pandas as pdimport numpy as npimport seaborn as snsimport plotly.offline a…

Example of use \G in negative variable-length lookbehinds to limit how far back the lookbehind goes

In the pypi page of the awesome regex module (https://pypi.python.org/pypi/regex) it is stated that \G can be used "in negative variable-length lookbehinds to limit how far back the lookbehind goe…

Regex with lookbehind not working using re.match

The following python code:import reline="http://google.com" procLine = re.match(r(?<=http).*, line) if procLine.group() == "":print(line + ": did not match regex") els…

testing python multiprocessing pool code with nose

I am trying to write tests with nose that get set up with something calculated using multiprocessing.I have this directory structure:code/tests/tests.pytests.py looks like this:import multiprocessing a…

Python verify url goes to a page

I have a list of urls (1000+) which have been stored for over a year now. I want to run through and verify them all to see if they still exist. What is the best / quickest way to check them all and re…

Bokeh: Synchronizing hover tooltips in linked plots

I have two linked plots. When hovering, I would like to have a tooltip appear in both plots. I already use the linked selection with great success, but now I want to link the tooltips also.Below is an …