Question 1

I have a pandas DataFrame object named xiv which has a column of int64 Volume measurements.

In[]: xiv['Volume'].head(5)
Out[]: 0    252000
1    484000
2     62000
3    168000
4    232000
Name: Volume, dtype: int64

I have read other posts (like this and this) that suggest the following solutions. But when I use either approach, it doesn't appear to change the dtype of the underlying data:

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume'])In[]: xiv['Volume'].dtypes
Out[]: 
dtype('int64')

Or...

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume'])
Out[]: ###omitted for brevity###In[]: xiv['Volume'].dtypes
Out[]: 
dtype('int64')In[]: xiv['Volume'] = xiv['Volume'].apply(pd.to_numeric)In[]: xiv['Volume'].dtypes
Out[]: 
dtype('int64')

I've also tried making a separate pandas Series and using the methods listed above on that Series and reassigning to the x['Volume'] obect, which is a pandas.core.series.Series object.

I have, however, found a solution to this problem using the numpy package's float64 type - this works but I don't know why it's different.

In[]: xiv['Volume'] = xiv['Volume'].astype(np.float64)In[]: xiv['Volume'].dtypes
Out[]: 
dtype('float64')

Can someone explain how to accomplish with the pandas library what the numpy library seems to do easily with its float64 class; that is, convert the column in the xiv DataFrame to a float64 in place.

Question 2

If you already have numeric dtypes (int8|16|32|64,float64,boolean) you can convert it to another "numeric" dtype using Pandas .astype() method.

Demo:

In [90]: df = pd.DataFrame(np.random.randint(10**5,10**7,(5,3)),columns=list('abc'), dtype=np.int64)In [91]: df
Out[91]:a        b        c
0  9059440  9590567  2076918
1  5861102  4566089  1947323
2  6636568   162770  2487991
3  6794572  5236903  5628779
4   470121  4044395  4546794In [92]: df.dtypes
Out[92]:
a    int64
b    int64
c    int64
dtype: objectIn [93]: df['a'] = df['a'].astype(float)In [94]: df.dtypes
Out[94]:
a    float64
b      int64
c      int64
dtype: object

It won't work for object (string) dtypes, that can't be converted to numbers:

In [95]: df.loc[1, 'b'] = 'XXXXXX'In [96]: df
Out[96]:a        b        c
0  9059440.0  9590567  2076918
1  5861102.0   XXXXXX  1947323
2  6636568.0   162770  2487991
3  6794572.0  5236903  5628779
4   470121.0  4044395  4546794In [97]: df.dtypes
Out[97]:
a    float64
b     object
c      int64
dtype: objectIn [98]: df['b'].astype(float)
...
skipped
...
ValueError: could not convert string to float: 'XXXXXX'

So here we want to use pd.to_numeric() method:

In [99]: df['b'] = pd.to_numeric(df['b'], errors='coerce')In [100]: df
Out[100]:a          b        c
0  9059440.0  9590567.0  2076918
1  5861102.0        NaN  1947323
2  6636568.0   162770.0  2487991
3  6794572.0  5236903.0  5628779
4   470121.0  4044395.0  4546794In [101]: df.dtypes
Out[101]:
a    float64
b    float64
c      int64
dtype: object

When to apply(pd.to_numeric) and when to astype(np.float64) in python?

Related Q&A

How to change folder names in python?

Python return list from function

Python Json loads() returning string instead of dictionary?

Sort dataframe by string length

How to mock pythons datetime.now() in a class method for unit testing?

How can I select only one column using SQLAlchemy?

Get first list index containing sub-string?

TypeError: Invalid dimensions for image data when plotting array with imshow()

How to give delay between each requests in scrapy?

preprocess_input() method in keras