Pandas reading NULL as a NaN float instead of str [duplicate]

2024/11/15 0:34:20

Given the file:

$ cat test.csv 
a,b,c,NULL,d
e,f,g,h,i
j,k,l,m,n

Where the 3rd column is to be treated as str.

When I did a string function on the column, pandas has read the NULL str as a NaN float:

>>> import pandas as pd
>>> df = pd.read_csv('test.csv', names=[0,1,2,3,4], dtype={0:str, 1:str, 2:str, 3:str, 4:str})>>> df[3].apply(str.strip)
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/usr/local/lib/python3.5/site-packages/pandas/core/series.py", line 2355, in applymapped = lib.map_infer(values, f, convert=convert_dtype)File "pandas/_libs/src/inference.pyx", line 1569, in pandas._libs.lib.map_infer (pandas/_libs/lib.c:66440)
TypeError: descriptor 'strip' requires a 'str' object but received a 'float'

To verify:

>>> for i in df[3]:
...    print (type(i), i)
... 
<class 'float'> nan
<class 'str'> h
<class 'str'> m

I've specified the dtype at initialization but somehow it got overriden.

How do I force the type of a specific column to be fixed?

Is there a way of automatically finding these abnormal NaN floats and change then back to 'NULL' string?

Answer

For me works astype:

df[3] = df[3].astype(str)for i in df[3]:print (type(i), i)<class 'str'> nan
<class 'str'> h
<class 'str'> m

Another solution is use keep_default_na=False in read_csv:

import pandas as pd
from pandas.compat import StringIOtemp=u"""a,b,c,NULL,d
e,f,g,h,i
j,k,l,m,n"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),  names=[0,1,2,3,4], keep_default_na=False)
print (df)0  1  2     3  4
0  a  b  c  NULL  d
1  e  f  g     h  i
2  j  k  l     m  nfor i in df[3]:print (type(i), i)
<class 'str'> NULL
<class 'str'> h
<class 'str'> m

Then is possible use na_values parameter if need parse NaN in numeric columns, but it has to be different e.g. NA:

import pandas as pd
from pandas.compat import StringIOtemp=u"""a,b,c,NULL,1
e,f,g,h,2
j,k,l,m,NA"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),  names=[0,1,2,3,4], keep_default_na=False, na_values=['NA'])
print (df)0  1  2     3    4
0  a  b  c  NULL  1.0
1  e  f  g     h  2.0
2  j  k  l     m  NaNfor i in df[3]:print (type(i), i)
<class 'str'> NULL
<class 'str'> h
<class 'str'> mfor i in df[4]:print (type(i), i)
<class 'numpy.float64'> 1.0
<class 'numpy.float64'> 2.0
<class 'numpy.float64'> nan
https://en.xdnf.cn/q/72478.html

Related Q&A

How to invert differencing in a Python statsmodels ARIMA forecast?

Im trying to wrap my head around ARIMA forecasting using Python and Statsmodels. Specifically, for the ARIMA algorithm to work, the data needs to be made stationary via differencing (or similar method)…

how to see the content of a particular file in .tar.gz archive without unzipping the contents?

for ex abc.tar.gz has abc/file1.txt abc/file2.txt abc/abc1/file3.txt abc/abc2/file4.txt i need to read/display the contents of file3.txt without extracting the file.Thanks for any input.

Matplotlib animation not showing

When I try this on my computer at home, it works, but not on my computer at work. Heres the codeimport numpy as np import matplotlib.pyplot as plt import matplotlib.animation as animation import sys im…

Extracting Fields Names of an HTML form - Python

Assume that there is a link "http://www.someHTMLPageWithTwoForms.com" which is basically a HTML page having two forms (say Form 1 and Form 2). I have a code like this ...import httplib2 from …

Best way to combine a permutation of conditional statements

So, I have a series of actions to perform, based on 4 conditional variables - lets say x,y,z & t. Each of these variables have a possible True or False value. So, that is a total of 16 possible per…

Fast way to get N Min or Max elements from a list in Python

I currently have a long list which is being sorted using a lambda function f. I then choose a random element from the first five elements. Something like:f = lambda x: some_function_of(x, local_variabl…

Continue if else in inline for Python

I have not been able to find the trick to do a continue/pass on an if in a for, any ideas?. Please dont provide explicit loops as solutions, it should be everything in a one liner.I tested the code wi…

HTTPError: HTTP Error 403: Forbidden on Google Colab

I am trying to download MNIST data in PyTorch using the following code:train_loader = torch.utils.data.DataLoader(datasets.MNIST(data,train=True,download=True,transform=transforms.Compose([transforms.T…

Pandas partial melt or group melt

I have a DataFrame like this>>> df = pd.DataFrame([[1,1,2,3,4,5,6],[2,7,8,9,10,11,12]], columns=[id, ax,ay,az,bx,by,bz]) >>> dfid ax ay az bx by bz 0 1 1 2 3 4 5 6…

How do I detect when my window is minimized with wxPython?

I am writing a small wxPython utility.I would like to use some event to detect when a user minimizes the application/window.I have looked around but did not find an event like wx.EVT_MINIMIZE that I co…