Question 1

I have a tabledata.csv file and I have been using pandas.read_csv to read or choose specific columns with specific conditions.

For instance I use the following code to select all "name" where session_id =1, which is working fine on IPython Notebook on datascientistworkbench.

             df = pandas.read_csv('/resources/data/findhelp/tabledata.csv')df['name'][df['session_id']==1]

I just wonder after I have read the csv file, is it possible to somehow "switch/read" it as a sql database. (i am pretty sure that i did not explain it well using the correct terms, sorry about that!). But what I want is that I do want to use SQL statements on IPython notebook to choose specific rows with specific conditions. Like I could use something like:

Select `name`, count(distinct `session_id`) from tabledata where `session_id` like "100.1%" group by `session_id` order by `session_id`

But I guess I do need to figure out a way to change the csv file into another version so that I could use sql statement. Many thx!

Question 2

Here is a quick primer on pandas and sql, using the builtin sqlite3 package. Generally speaking you can do all SQL operations in pandas in one way or another. But databases are of course useful. The first thing you need to do is store the original df in a sql database so that you can query it. Steps listed below.

import pandas as pd
import sqlite3#read the CSV
df = pd.read_csv('/resources/data/findhelp/tabledata.csv')
#connect to a database
conn = sqlite3.connect("Any_Database_Name.db") #if the db does not exist, this creates a Any_Database_Name.db file in the current directory
#store your table in the database:
df.to_sql('Some_Table_Name', conn)
#read a SQL Query out of your database and into a pandas dataframe
sql_string = 'SELECT * FROM Some_Table_Name'
df = pd.read_sql(sql_string, conn)

SQL statement for CSV files on IPython notebook

Related Q&A

How to draw ellipsoid with plotly

PyTorch DataLoader uses same random seed for batches run in parallel

How to fix 502 Bad Gateway Error in production(Nginx)?

Shift theorem in Discrete Fourier Transform

Running pudb inside docker container

Argparse: defaults from file

How can access Uploaded File in Google colab

add to support addition of different types?

How to open .ndjson file in Python?

loading a dataset in python (numpy) when there are variable spaces delimiting columns