Python - pandas - Append Series into Blank DataFrame

2024/10/13 7:26:36

Say I have two pandas Series in python:

import pandas as pd
h = pd.Series(['g',4,2,1,1])
g = pd.Series([1,6,5,4,"abc"])

I can create a DataFrame with just h and then append g to it:

df = pd.DataFrame([h])
df1 = df.append(g, ignore_index=True)

I get:

>>> df10  1  2  3    4
0  g  4  2  1    1
1  1  6  5  4  abc

But now suppose that I have an empty DataFrame and I try to append h to it:

df2 = pd.DataFrame([])
df3 = df2.append(h, ignore_index=True)

This does not work. I think the problem is in the second-to-last line of code. I need to somehow define the blank DataFrame to have the proper number of columns.

By the way, the reason I am trying to do this is that I am scraping text from the internet using requests+BeautifulSoup and I am processing it and trying to write it to a DataFrame one row at a time.

Answer

So if you don't pass an empty list to the DataFrame constructor then it works:

In [16]:df = pd.DataFrame()
h = pd.Series(['g',4,2,1,1])
df = df.append(h,ignore_index=True)
df
Out[16]:0  1  2  3  4
0  g  4  2  1  1[1 rows x 5 columns]

The difference between the two constructor approaches appears to be that the index dtypes are set differently, with an empty list it is an Int64 with nothing it is an object:

In [21]:df = pd.DataFrame()
print(df.index.dtype)
df = pd.DataFrame([])
print(df.index.dtype)
object
int64

Unclear to me why the above should affect the behaviour (I'm guessing here).

UPDATE

After revisiting this I can confirm that this looks to me to be a bug in pandas version 0.12.0 as your original code works fine:

In [13]:import pandas as pd
df = pd.DataFrame([])
h = pd.Series(['g',4,2,1,1])
df.append(h,ignore_index=True)Out[13]:0  1  2  3  4
0  g  4  2  1  1[1 rows x 5 columns]

I am running pandas 0.13.1 and numpy 1.8.1 64-bit using python 3.3.5.0 but I think the problem is pandas but I would upgrade both pandas and numpy to be safe, I don't think this is a 32 versus 64-bit python issue.

https://en.xdnf.cn/q/69561.html

Related Q&A

How to retrieve values from a function run in parallel processes?

The Multiprocessing module is quite confusing for python beginners specially for those who have just migrated from MATLAB and are made lazy with its parallel computing toolbox. I have the following fun…

SignalR Alternative for Python

What would be an alternative for SignalR in Python world?To be precise, I am using tornado with python 2.7.6 on Windows 8; and I found sockjs-tornado (Python noob; sorry for any inconveniences). But s…

Python Variable Scope and Classes

In Python, if I define a variable:my_var = (1,2,3)and try to access it in __init__ function of a class:class MyClass:def __init__(self):print my_varI can access it and print my_var without stating (glo…

How to check if valid excel file in python xlrd library

Is there any way with xlrd library to check if the file you use is a valid excel file? I know theres other libraries to check headers of files and I could use file extension check. But for the sake of…

ValueError: Invalid file path or buffer object type: class tkinter.StringVar

Here is a simplified version of some code that I have. In the first frame, the user selects a csv file using tk.filedialog and it is meant to be plotted on the same frame on the canvas. There is also a…

Is there a way to reopen a socket?

I create many "short-term" sockets in some code that look like that :nb=1000 for i in range(nb):sck = socket.socket(socket.AF_INET, socket.SOCK_STREAM)sck.connect((adr, prt)sck.send(question …

How django handles simultaneous requests with concurrency over global variables?

I have a django instance hosted via apache/mod_wsgi. I use pre_save and post_save signals to store the values before and after save for later comparisons. For that I use global variables to store the p…

Why cant I string.print()?

My understanding of the print() in both Python and Ruby (and other languages) is that it is a method on a string (or other types). Because it is so commonly used the syntax:print "hi"works.S…

Difference between R.scale() and sklearn.preprocessing.scale()

I am currently moving my data analysis from R to Python. When scaling a dataset in R i would use R.scale(), which in my understanding would do the following: (x-mean(x))/sd(x)To replace that function I…

Generate random timeseries data with dates

I am trying to generate random data(integers) with dates so that I can practice pandas data analytics commands on it and plot time series graphs. temp depth acceleration 2019-01-1 -0.218062 -1.21…