Generate SQL statements from a Pandas Dataframe

2024/11/20 9:33:06

I am loading data from various sources (csv, xls, json etc...) into Pandas dataframes and I would like to generate statements to create and fill a SQL database with this data. Does anyone know of a way to do this?

I know pandas has a to_sql function, but that only works on a database connection, it can not generate a string.

Example

What I would like is to take a dataframe like so:

import pandas as pd
import numpy as npdates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))

And a function that would generate this (this example is PostgreSQL but any would be fine):

CREATE TABLE data
(index timestamp with time zone,"A" double precision,"B" double precision,"C" double precision,"D" double precision
)
Answer

If you only want the 'CREATE TABLE' sql code (and not the insert of the data), you can use the get_schema function of the pandas.io.sql module:

In [10]: print pd.io.sql.get_schema(df.reset_index(), 'data')
CREATE TABLE "data" ("index" TIMESTAMP,"A" REAL,"B" REAL,"C" REAL,"D" REAL
)

Some notes:

  • I had to use reset_index because it otherwise didn't include the index
  • If you provide an sqlalchemy engine of a certain database flavor, the result will be adjusted to that flavor (eg the data type names).
https://en.xdnf.cn/q/26338.html

Related Q&A

How to translate a model label in Django Admin?

I could translate Django Admin except a model label because I dont know how to translate a model label in Django Admin. So, how can I translate a model label in Django Admin?

converty numpy array of arrays to 2d array

I have a pandas series features that has the following values (features.values)array([array([0, 0, 0, ..., 0, 0, 0]), array([0, 0, 0, ..., 0, 0, 0]),array([0, 0, 0, ..., 0, 0, 0]), ...,array([0, 0, 0, …

profiling a method of a class in Python using cProfile?

Id like to profile a method of a function in Python, using cProfile. I tried the following:import cProfile as profile# Inside the class method... profile.run("self.myMethod()", "output_f…

Installing h5py on an Ubuntu server

I was installing h5py on an Ubuntu server. However it seems to return an error that h5py.h is not found. It gives the same error message when I install it using pip or the setup.py file. What am I miss…

NLTK Named Entity Recognition with Custom Data

Im trying to extract named entities from my text using NLTK. I find that NLTK NER is not very accurate for my purpose and I want to add some more tags of my own as well. Ive been trying to find a way t…

How do I write to the console in Google App Engine?

Often when I am coding I just like to print little things (mostly the current value of variables) out to console. I dont see anything like this for Google App Engine, although I note that the Google Ap…

Does Google App Engine support Python 3?

I started learning Python 3.4 and would like to start using libraries as well as Google App Engine, but the majority of Python libraries only support Python 2.7 and the same with Google App Engine.Shou…

how to subquery in queryset in django?

how can i have a subquery in djangos queryset? for example if i have:select name, age from person, employee where person.id = employee.id and employee.id in (select id from employee where employee.com…

Opening sqlite3 database from python in read-only mode

While using sqlite3 from C/C++ I learned that it has a open-in-read-only mode option, which is very handy to avoid accidental data-corruption. Is there such a thing in the Python binding?

SyntaxError: Generator expression must be parenthesized

I just installed django and after installing that I created a django project and was trying to run django server by command:python manage.py runserverAfter that Iam getting error as: SyntaxError: Gene…