When to apply(pd.to_numeric) and when to astype(np.float64) in python?

2024/11/21 0:25:00

I have a pandas DataFrame object named xiv which has a column of int64 Volume measurements.

In[]: xiv['Volume'].head(5)
Out[]: 0    252000
1    484000
2     62000
3    168000
4    232000
Name: Volume, dtype: int64

I have read other posts (like this and this) that suggest the following solutions. But when I use either approach, it doesn't appear to change the dtype of the underlying data:

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume'])In[]: xiv['Volume'].dtypes
Out[]: 
dtype('int64')

Or...

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume'])
Out[]: ###omitted for brevity###In[]: xiv['Volume'].dtypes
Out[]: 
dtype('int64')In[]: xiv['Volume'] = xiv['Volume'].apply(pd.to_numeric)In[]: xiv['Volume'].dtypes
Out[]: 
dtype('int64')

I've also tried making a separate pandas Series and using the methods listed above on that Series and reassigning to the x['Volume'] obect, which is a pandas.core.series.Series object.

I have, however, found a solution to this problem using the numpy package's float64 type - this works but I don't know why it's different.

In[]: xiv['Volume'] = xiv['Volume'].astype(np.float64)In[]: xiv['Volume'].dtypes
Out[]: 
dtype('float64') 

Can someone explain how to accomplish with the pandas library what the numpy library seems to do easily with its float64 class; that is, convert the column in the xiv DataFrame to a float64 in place.

Answer

If you already have numeric dtypes (int8|16|32|64,float64,boolean) you can convert it to another "numeric" dtype using Pandas .astype() method.

Demo:

In [90]: df = pd.DataFrame(np.random.randint(10**5,10**7,(5,3)),columns=list('abc'), dtype=np.int64)In [91]: df
Out[91]:a        b        c
0  9059440  9590567  2076918
1  5861102  4566089  1947323
2  6636568   162770  2487991
3  6794572  5236903  5628779
4   470121  4044395  4546794In [92]: df.dtypes
Out[92]:
a    int64
b    int64
c    int64
dtype: objectIn [93]: df['a'] = df['a'].astype(float)In [94]: df.dtypes
Out[94]:
a    float64
b      int64
c      int64
dtype: object

It won't work for object (string) dtypes, that can't be converted to numbers:

In [95]: df.loc[1, 'b'] = 'XXXXXX'In [96]: df
Out[96]:a        b        c
0  9059440.0  9590567  2076918
1  5861102.0   XXXXXX  1947323
2  6636568.0   162770  2487991
3  6794572.0  5236903  5628779
4   470121.0  4044395  4546794In [97]: df.dtypes
Out[97]:
a    float64
b     object
c      int64
dtype: objectIn [98]: df['b'].astype(float)
...
skipped
...
ValueError: could not convert string to float: 'XXXXXX'

So here we want to use pd.to_numeric() method:

In [99]: df['b'] = pd.to_numeric(df['b'], errors='coerce')In [100]: df
Out[100]:a          b        c
0  9059440.0  9590567.0  2076918
1  5861102.0        NaN  1947323
2  6636568.0   162770.0  2487991
3  6794572.0  5236903.0  5628779
4   470121.0  4044395.0  4546794In [101]: df.dtypes
Out[101]:
a    float64
b    float64
c      int64
dtype: object
https://en.xdnf.cn/q/26258.html

Related Q&A

How to change folder names in python?

I have multiple folders each with the name of a person, with the first name(s) first and the surname last. I want to change the folder names so that the surname is first followed by a comma and then t…

Python return list from function

I have a function that parses a file into a list. Im trying to return that list so I can use it in other functions. def splitNet():network = []for line in open("/home/tom/Dropbox/CN/Python/CW2/net…

Python Json loads() returning string instead of dictionary?

Im trying to do some simple JSON parsing using Python 3s built in JSON module, and from reading a bunch of other questions on SO and googling, it seems this is supposed to be pretty straightforward. Ho…

Sort dataframe by string length

I want to sort by name length. There doesnt appear to be a key parameter for sort_values so Im not sure how to accomplish this. Here is a test df:import pandas as pd df = pd.DataFrame({name: [Steve, Al…

How to mock pythons datetime.now() in a class method for unit testing?

Im trying to write tests for a class that has methods like:import datetime import pytzclass MyClass:def get_now(self, timezone):return datetime.datetime.now(timezone)def do_many_things(self, tz_string=…

How can I select only one column using SQLAlchemy?

I want to select (and return) one field only from my database with a "where clause". The code is:from sqlalchemy.orm import load_only@application.route("/user", methods=[GET, POST])…

Get first list index containing sub-string?

For lists, the method list.index(x) returns the index in the list of the first item whose value is x. But if I want to look inside the list items, and not just at the whole items, how do I make the mos…

TypeError: Invalid dimensions for image data when plotting array with imshow()

For the following code# Numerical operation SN_map_final = (new_SN_map - mean_SN) / sigma_SN # Plot figure fig12 = plt.figure(12) fig_SN_final = plt.imshow(SN_map_final, interpolation=nearest) plt.col…

How to give delay between each requests in scrapy?

I dont want to crawl simultaneously and get blocked. I would like to send one request per second.

preprocess_input() method in keras

I am trying out sample keras code from the below keras documentation page, https://keras.io/applications/What preprocess_input(x) function of keras module does in the below code? Why do we have to do …