SQL statement for CSV files on IPython notebook

2024/7/7 8:03:06

I have a tabledata.csv file and I have been using pandas.read_csv to read or choose specific columns with specific conditions.

For instance I use the following code to select all "name" where session_id =1, which is working fine on IPython Notebook on datascientistworkbench.

             df = pandas.read_csv('/resources/data/findhelp/tabledata.csv')df['name'][df['session_id']==1]

I just wonder after I have read the csv file, is it possible to somehow "switch/read" it as a sql database. (i am pretty sure that i did not explain it well using the correct terms, sorry about that!). But what I want is that I do want to use SQL statements on IPython notebook to choose specific rows with specific conditions. Like I could use something like:

Select `name`, count(distinct `session_id`) from tabledata where `session_id` like "100.1%" group by `session_id` order by `session_id`

But I guess I do need to figure out a way to change the csv file into another version so that I could use sql statement. Many thx!

Answer

Here is a quick primer on pandas and sql, using the builtin sqlite3 package. Generally speaking you can do all SQL operations in pandas in one way or another. But databases are of course useful. The first thing you need to do is store the original df in a sql database so that you can query it. Steps listed below.

import pandas as pd
import sqlite3#read the CSV
df = pd.read_csv('/resources/data/findhelp/tabledata.csv')
#connect to a database
conn = sqlite3.connect("Any_Database_Name.db") #if the db does not exist, this creates a Any_Database_Name.db file in the current directory
#store your table in the database:
df.to_sql('Some_Table_Name', conn)
#read a SQL Query out of your database and into a pandas dataframe
sql_string = 'SELECT * FROM Some_Table_Name'
df = pd.read_sql(sql_string, conn)
https://en.xdnf.cn/q/73228.html

Related Q&A

How to draw ellipsoid with plotly

Are there any way to plot a surface like ellipsoid with plotly 3D?Currently only surfaces of the form z=f(x,y) are discussed in the docs. There is also Mesh 3D, but I found no examples for it. It seem…

PyTorch DataLoader uses same random seed for batches run in parallel

There is a bug in PyTorch/Numpy where when loading batches in parallel with a DataLoader (i.e. setting num_workers > 1), the same NumPy random seed is used for each worker, resulting in any random f…

How to fix 502 Bad Gateway Error in production(Nginx)?

When I tried to upload a big csv file of size about 600MB in my project which is hosted in the digital ocean, it tries to upload but shows 502 Bad Gateway Error (Nginx). The application is a data conve…

Shift theorem in Discrete Fourier Transform

Im trying to solve a problem with python+numpy in which Ive some functions of type that I need to convolve with another function . In order to optimize code, I performed the fft of f and g, I multipli…

Running pudb inside docker container

I prefer pudb for python debugging. I am building python applications that run inside docker container. Does any one know how to make pudb available inside docker container?Thank you

Argparse: defaults from file

I have a Python script which takes a lot of arguments. I currently use a configuration.ini file (read using configparser), but would like to allow the user to override specific arguments using command …

How can access Uploaded File in Google colab

Im new in python and I use Google Colab . I uploaded a train_data.npy into google Colab and then I want to use it . According to this link How to import and read a shelve or Numpy file in Google Colabo…

__add__ to support addition of different types?

Would be very easy to solve had python been a static programming language that supported overloading. I am making a class called Complex which is a representation of complex numbers (I know python has …

How to open .ndjson file in Python?

I have .ndjson file that has 20GB that I want to open with Python. File is to big so I found a way to split it into 50 peaces with one online tool. This is the tool: https://pinetools.com/split-files N…

loading a dataset in python (numpy) when there are variable spaces delimiting columns

I have a big dataset contains numeric data and in some of its rows there are variable spaces delimiting columns, like:4 5 6 7 8 9 2 3 4When I use this line:dataset=numpy.loadtxt("dataset.txt&q…