Pandas reading NULL as a NaN float instead of str [duplicate]
2024/11/15 0:34:20
Given the file:
$ cat test.csv
a,b,c,NULL,d
e,f,g,h,i
j,k,l,m,n
Where the 3rd column is to be treated as str.
When I did a string function on the column, pandas has read the NULL str as a NaN float:
>>> import pandas as pd
>>> df = pd.read_csv('test.csv', names=[0,1,2,3,4], dtype={0:str, 1:str, 2:str, 3:str, 4:str})>>> df[3].apply(str.strip)
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/usr/local/lib/python3.5/site-packages/pandas/core/series.py", line 2355, in applymapped = lib.map_infer(values, f, convert=convert_dtype)File "pandas/_libs/src/inference.pyx", line 1569, in pandas._libs.lib.map_infer (pandas/_libs/lib.c:66440)
TypeError: descriptor 'strip' requires a 'str' object but received a 'float'
To verify:
>>> for i in df[3]:
... print (type(i), i)
...
<class 'float'> nan
<class 'str'> h
<class 'str'> m
I've specified the dtype at initialization but somehow it got overriden.
How do I force the type of a specific column to be fixed?
Is there a way of automatically finding these abnormal NaN floats and change then back to 'NULL' string?
Answer
For me works astype:
df[3] = df[3].astype(str)for i in df[3]:print (type(i), i)<class 'str'> nan
<class 'str'> h
<class 'str'> m
Another solution is use keep_default_na=False in read_csv:
import pandas as pd
from pandas.compat import StringIOtemp=u"""a,b,c,NULL,d
e,f,g,h,i
j,k,l,m,n"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), names=[0,1,2,3,4], keep_default_na=False)
print (df)0 1 2 3 4
0 a b c NULL d
1 e f g h i
2 j k l m nfor i in df[3]:print (type(i), i)
<class 'str'> NULL
<class 'str'> h
<class 'str'> m
Then is possible use na_values parameter if need parse NaN in numeric columns, but it has to be different e.g. NA:
import pandas as pd
from pandas.compat import StringIOtemp=u"""a,b,c,NULL,1
e,f,g,h,2
j,k,l,m,NA"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), names=[0,1,2,3,4], keep_default_na=False, na_values=['NA'])
print (df)0 1 2 3 4
0 a b c NULL 1.0
1 e f g h 2.0
2 j k l m NaNfor i in df[3]:print (type(i), i)
<class 'str'> NULL
<class 'str'> h
<class 'str'> mfor i in df[4]:print (type(i), i)
<class 'numpy.float64'> 1.0
<class 'numpy.float64'> 2.0
<class 'numpy.float64'> nan
Im trying to wrap my head around ARIMA forecasting using Python and Statsmodels. Specifically, for the ARIMA algorithm to work, the data needs to be made stationary via differencing (or similar method)…
for ex abc.tar.gz has abc/file1.txt
abc/file2.txt
abc/abc1/file3.txt
abc/abc2/file4.txt i need to read/display the contents of file3.txt without extracting the file.Thanks for any input.
When I try this on my computer at home, it works, but not on my computer at work. Heres the codeimport numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import sys
im…
Assume that there is a link "http://www.someHTMLPageWithTwoForms.com" which is basically a HTML page having two forms (say Form 1 and Form 2). I have a code like this ...import httplib2
from …
So, I have a series of actions to perform, based on 4 conditional variables - lets say x,y,z & t. Each of these variables have a possible True or False value. So, that is a total of 16 possible per…
I currently have a long list which is being sorted using a lambda function f. I then choose a random element from the first five elements. Something like:f = lambda x: some_function_of(x, local_variabl…
I have not been able to find the trick to do a continue/pass on an if in a for, any ideas?. Please dont provide explicit loops as solutions, it should be everything in a one liner.I tested the code wi…
I am trying to download MNIST data in PyTorch using the following code:train_loader = torch.utils.data.DataLoader(datasets.MNIST(data,train=True,download=True,transform=transforms.Compose([transforms.T…
I have a DataFrame like this>>> df = pd.DataFrame([[1,1,2,3,4,5,6],[2,7,8,9,10,11,12]], columns=[id, ax,ay,az,bx,by,bz])
>>> dfid ax ay az bx by bz
0 1 1 2 3 4 5 6…
I am writing a small wxPython utility.I would like to use some event to detect when a user minimizes the application/window.I have looked around but did not find an event like wx.EVT_MINIMIZE that I co…