pandas.algos._return_false causes PicklingError with dill.dump_session on CentOS

2024/9/16 23:22:07

I have a code framework which involves dumping sessions with dill. This used to work just fine, until I started to use pandas. The following code raises a PicklingError on CentOS release 6.5:

import pandas
import dill
dill.dump_session('x.dat')

The problem seems to stem from pandas.algos. In fact, it's enough to run this to reproduce the error:

import pandas.algos
import dill
dill.dump_session('x.dat') / dill.dumps(pandas.algos)

The error is pickle.PicklingError: Can't pickle <cyfunction lambda1 at 0x1df3050>: it's not found as pandas.algos.lambda1.

The thing is, this error is not raised on my pc. Both of them have same versions of pandas (0.14.1), dill (0.2.1), and python (2.7.6).

Looking on the badobjects, I get:

>>> dill.detect.badobjects(pandas.algos, depth = 1)
{'__builtins__': <module '__builtin__' (built-in)>, 
'_return_true': <cyfunction lambda2 at 0x1484d70>, 
'np': <module 'numpy' from '/usr/local/lib/python2.7/site-packages/numpy-1.8.2-py2.7-linux-x86_64.egg/numpy/__init__.pyc'>, 
'_return_false': <cyfunction lambda1 at 0x1484cc8>, 
'lib': <module 'pandas.lib' from '/home/talkr/.local/lib/python2.7/site-packages/pandas/lib.so'>}

This seems to be due to different handling of pandas.algos by the two OS-s (perhaps different compilers?). On my PC, where dump_session is without errors, pandas.algos._return_false is <cyfunction <lambda> at 0x06DD02A0>, while on CentOS it's <cyfunction lambda1 at 0x1df3050>. Why is it handled differently?

Answer

I'm not seeing what you are seeing on a mac. Here's what I see, using the same version of pandas. I do see that you are using a different version of dill. I'm using the version from github. I'll check if there was a tweak to saving modules or globals in dill that might have had that impact on some distros.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> import dill
>>> dill.detect.trace(True)
>>> dill.dump_session('x.pkl')
M1: <module '__main__' (built-in)>
F2: <function _import_module at 0x1069ff140>
D2: <dict object at 0x106a0b280>
M2: <module 'dill' from '/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/__init__.pyc'>
M2: <module 'pandas' from '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/__init__.pyc'>

Here is what I get for pandas.algos,

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas.algos
>>> import dill
>>> dill.dumps(pandas.algos)
'\x80\x02cdill.dill\n_import_module\nq\x00U\x0cpandas.algosq\x01\x85q\x02Rq\x03.'

Here's what I get for pandas.algos._return_false:

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> import pandas.algos
>>> dill.dumps(pandas.algos._return_false)
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.py", line 180, in dumpsdump(obj, file, protocol, byref, file_mode, safeio)File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.py", line 173, in dumppik.dump(obj)File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dumpself.save(obj)File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 317, in saveself.save_global(obj, rv)File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 748, in save_global(obj, module, name))
pickle.PicklingError: Can't pickle <cyfunction lambda1 at 0x10d403cc8>: it's not found as pandas.algos.lambda1

So, I can now reproduce your error.

This looks like an unpicklable object, based on how it's built. However, it should be able to be pickled inside the module… as it is for me. You seem to have pinpointed the difference between what you are seeing in the object pandas builds on CentOS.

Looking at the pandas codebase, pandas.algos is a pyx file… so that's cython. And here's the code.

_return_false = lambda self, other: False

Were that in a .py file, I know it would serialize. I have no idea how dill works for cython generated lambdas… (e.g. a lambda cyfunction).

It looks like there was a commit (https://github.com/pydata/pandas/commit/73c71dfca10012e25c829930508b5d6f7ccad5ff) in which _return_false was moved outside a class into the module scope. Do you see that on both CentOS and your PC? It may be that the v0.14.1 for different distros was cut off slightly different git versions… depending on how you installed pandas.

So apparently, I can pick up a lambda1 by trying to get the source of the object… which for lambda, if it can't get the source, dill will grab by name… and apparently it's named lambda1… even though that doesn't show up in the .pyx file. Maybe it's due to how cython builds the lambdas.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas.algos
>>> import dill
>>> dill.source.importable(pandas.algos._return_false)
'from pandas import lambda1\n'

The difference might be coming from cython… since the code is generated from a .pyx in pandas. What's your versions of cython? Mine is 0.20.2.

https://en.xdnf.cn/q/72809.html

Related Q&A

How to send an image directly from flask server to html?

I am new to flask and am trying to make an app such an image is taken by the html and js from the webcam and then it is sent to the server with ajax request. I got this part. Then some processing is do…

Alternatives to nested numpy.where for multiconditional pandas operations?

I have a Pandas DataFrame with conditional column A and numeric column B. A B 1 foo 1.2 2 bar 1.3 3 foo 2.2I also have a Python dictionary that defines ranges of B which denote "success" g…

OpenCV findContours() just returning one external contour

Im trying to isolate letters in a captcha, I managed to filter a captcha and that result in this black and white image:But when I tried to separate the letters with findContours method of OpenCV it jus…

Selenium/ChromeDriver Unknown policy Errors

I am currently using Python (v3.5.1), Selenium (v3.7), and Chromedriver (v2.33).When I run the following command:from selenium import webdriver driver = webdriver.Chrome(C:\Program Files\ChromeWebdrive…

Mako escaping issue within Pyramid

I need to put javascript function to mako template. The first argument of this function is string, so I write in my *.mako file (dict(field_name=geom)):init_map(${field_name} );But when I see my html p…

All arguments should have the same length plotly

I try to do a bar graph using plotly.express but I find this problemAll arguments should have the same length. The length of argument y is 51, whereas the length of previously-processed arguments [x] …

Python curves intersection with fsolve() and function arguments using numpy

I am trying to use fsolve as quoted here : http://glowingpython.blogspot.gr/2011/05/hot-to-find-intersection-of-two.html,On order to find the intersection between two curves. Both curves basically are …

What is the Python freeze process?

The Python Doc states:Frozen modules are modules written in Python whose compiled byte-codeobject is incorporated into a custom-built Python interpreter byPython’s freeze utility. See Tools/freeze/ fo…

Is there a way to get the top k values per row of a numpy array (Python)?

Given a numpy array of the form below:x = [[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]]is there a way to retain the top-3 values in each row and set others to zero in python (without an…

Python with tcpdump in a subprocess: how to close subprocess properly?

I have a Python script to capture network traffic with tcpdump in a subprocess:p = subprocess.Popen([tcpdump, -I, -i, en1,-w, cap.pcap], stdout=subprocess.PIPE) time.sleep(10) p.kill()When this script …