Pandas assert_frame_equal error

2024/11/15 11:28:42

I'm building test cases and I want to compare 2 dataframes. Even though dataframe have the same columns and values assert_frame_equal reports are not equal. Column order is different, I tried reordering columns without any success.

In my test case Im using the following function:

testing.assert_frame_equal(expected, tested, check_dtype=False)

The first dataframe is declared like this:

    df2 = pandas.DataFrame({'artista': [u'Beyoncé', 'Radiolab', 'Xmas', 'Beyonce'],'mid_sugerido': ['/g/11bz0dg4b_', '/g/11bt_6j9dk', '/g/11c2nz8jc2', '/g/11bt_6jXXX'],'texto': ['Lemonade', 'Radiolab', 'Merry Christmas Lil Mama', 'Beyonce'],'busqueda': [u'Beyoncé', 'Radiolab', 'Xmas', 'Beyonce'],'texto_sugerido': ['Lemonade', 'Radiolab', 'Merry Christmas Lil Mama', 'Beyonce'],'artista_sugerido': [u'Beyoncé', 'Radiolab', None, 'Beyonce'],'media_sugerido': ['album', 'album', 'track', 'album'],})

Pandas dataframe pd2:

    artista artista_sugerido  busqueda media_sugerido   mid_sugerido  \
0   Beyoncé          Beyoncé   Beyoncé          album  /g/11bz0dg4b_   
1  Radiolab         Radiolab  Radiolab          album  /g/11bt_6j9dk   
2      Xmas             None      Xmas          track  /g/11c2nz8jc2   
3   Beyonce          Beyonce   Beyonce          album  /g/11bt_6jXXX   texto            texto_sugerido  
0                  Lemonade                  Lemonade  
1                  Radiolab                  Radiolab  
2  Merry Christmas Lil Mama  Merry Christmas Lil Mama  
3                   Beyonce                   Beyonce  

The second dataframe is the dataframe returned from function (result).

    artista  busqueda   mid_sugerido                     texto  \
0   Beyoncé   Beyoncé  /g/11bz0dg4b_                  Lemonade   
1  Radiolab  Radiolab  /g/11bt_6j9dk                  Radiolab   
2      Xmas      Xmas  /g/11c2nz8jc2  Merry Christmas Lil Mama   
3   Beyonce   Beyonce  /g/11bt_6jXXX                   Beyonce   texto_sugerido artista_sugerido media_sugerido  
0                  Lemonade          Beyoncé          album  
1                  Radiolab         Radiolab          album  
2  Merry Christmas Lil Mama             None          track  
3                   Beyonce          Beyonce          album 

I get the following error when i run: assert_frame_equal(df2, result)

Traceback (most recent call last):File "/Users/spicyramen/Documents/Development/parzee/python/coverage/experimental/pandas_creation.py", line 158, in <module>assert_frame_equal(df6, _Normalize(df5, test_dict))File "/Users/spicyramen/Documents/Development/parzee/python/coverage/experimental/pandas_creation.py", line 16, in assert_frame_equaltesting.assert_frame_equal(expected, tested, check_dtype=False)File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 1142, in assert_frame_equalobj='{0}.columns'.format(obj))File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 761, in assert_index_equalobj=obj, lobj=left, robj=right)File "pandas/src/testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas/src/testing.c:3887)File "pandas/src/testing.pyx", line 147, in pandas._testing.assert_almost_equal (pandas/src/testing.c:2769)File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 915, in raise_assert_detailraise AssertionError(msg)
AssertionError: DataFrame.columns are differentDataFrame.columns values are different (85.71429 %)
[left]:  Index([u'artista', u'artista_sugerido', u'busqueda', u'media_sugerido',u'mid_sugerido', u'texto', u'texto_sugerido'],dtype='object')
[right]: Index([u'artista', u'busqueda', u'mid_sugerido', u'texto', u'texto_sugerido',u'artista_sugerido', u'media_sugerido'],dtype='object')

Columns are the same, but different order, If I use df.sort_index(axis=1) to reorder columns I get:

Traceback (most recent call last):File "/Users/spicyramen/Documents/Development/parzee/python/coverage/experimental/pandas_creation.py", line 154, in <module>assert_frame_equal(df6.sort_index(axis=1), _Normalize(df5, test_dict).sort_index(axis=1))File "/Users/spicyramen/Documents/Development/parzee/python/coverage/experimental/pandas_creation.py", line 16, in assert_frame_equaltesting.assert_frame_equal(expected, tested, check_dtype=False, check_like=False)File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 1166, in assert_frame_equalobj='DataFrame.iloc[:, {0}]'.format(i))File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 1049, in assert_series_equalcheck_less_precise, obj='{0}'.format(obj))File "pandas/src/testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas/src/testing.c:3887)File "pandas/src/testing.pyx", line 147, in pandas._testing.assert_almost_equal (pandas/src/testing.c:2769)File "/Library/Python/2.7/site-packages/pandas/util/testing.py", line 914, in raise_assert_detail[right]: {3}""".format(obj, message, left, right)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128)
Answer

I solved it by replacing:

assert_frame_equal(df2.sort_index(axis=1), myfunction(df1).sort_index(axis=1)) 

with

l  = myfunction(df1)
assert_frame_equal(df2.sort_index(axis=1), l.sort_index(axis=1))   
https://en.xdnf.cn/q/72440.html

Related Q&A

Multiple lines on line plot/time series with matplotlib

How do I plot multiple traces represented by a categorical variable on matplotlib or plot.ly on Python? I am trying to replicate the geom_line(aes(x=Date,y=Value,color=Group) function from R.Is there …

Python ABCs: registering vs. subclassing

(I am using python 2.7) The python documentation indicates that you can pass a mapping to the dict builtin and it will copy that mapping into the new dict:http://docs.python.org/library/stdtypes.html#…

python - ensure script is activated only once

Im writing a Python 2.7 script. In summary, this script is being run every night on Linux and activates several processes.Id like to ensure this script is not run multiple times in parallel (basically …

How to set up auto-deploy to AppEngine when pushing to Git Repository

Ive heard that other platforms support auto-deployment of their code to production when they push changes to their Git repository.Can I set up something similar to this for AppEngine? How?Im using Py…

#include zbar.h 1 error generated when running pip install zbar

Im trying to run pip install zbar and for some reason I cant seem to find an answer to solve this dependency issue. Any help would be extremely appreciated. See traceback below:Downloading/unpacking zb…

Django model field default based on another model field

I use Django Admin to build a management site. There are two tables, one is ModelA with data in it, another is ModelB with nothing in it. If one model field b_b in ModelB is None, it can be displayed o…

How do I improve remove duplicate algorithm?

My interview question was that I need to return the length of an array that removed duplicates but we can leave at most 2 duplicates. For example, [1, 1, 1, 2, 2, 3] the new array would be [1, 1, 2, 2,…

Looking for values in nested tuple

Say I have:t = ((dog, Dog),(cat, Cat),(fish, Fish), )And I need to check if a value is in the first bit of the nested tuple (ie. the lowercase bits). How can I do this? The capitalised values do not m…

Multiple lines user input in command-line Python application

Is there any easy way to handle multiple lines user input in command-line Python application?I was looking for an answer without any result, because I dont want to:read data from a file (I know, its t…

Performance difference between filling existing numpy array and creating a new one

In iterative algorithms, it is common to use large numpy arrays many times. Frequently the arrays need to be manually "reset" on each iteration. Is there a performance difference between fill…