I have a few functions in my code that are randomly causing SegmentationFault
error. I've identified them by enabling the faulthandler
. I'm a bit stuck and have no idea how to reliably eliminate this problem.
I'm thinking about some workaround. Since the functions are crashing randomly, I could potentially retry them after a failure. The problem is that there's no way to recover from SegmentationFault
crash.
The best idea I have for now is to rewrite these functions a bit and run them via subprocess. This solution will help me, that a crashed function won't crash the whole application, and can be retried.
Some of the functions are quite small and often executed, so it will significantly slow down my app. Is there any method to execute function in a separate context, faster than a subprocess that won't crash whole program in case of segfault?
I had some unreliable C extensions throw segfaults every once in a while and, since there was no way I was going to be able to fix that, what I did was create a decorator that would run the wrapped function in a separate process. That way you can stop segfaults from killing the main process.
Something like this:
https://gist.github.com/joezuntz/e7e7764e5b591ed519cfd488e20311f1
Mine was a bit simpler, and it did the job for me. Additionally it lets you choose a timeout and a default return value in case there was a problem:
#! /usr/bin/env python3# std imports
import multiprocessing as mpdef parametrized(dec):"""This decorator can be used to create other decorators that accept arguments"""def layer(*args, **kwargs):def repl(f):return dec(f, *args, **kwargs)return replreturn layer@parametrized
def sigsev_guard(fcn, default_value=None, timeout=None):"""Used as a decorator with arguments.The decorated function will be called with its input arguments in another process.If the execution lasts longer than *timeout* seconds, it will be considered failed.If the execution fails, *default_value* will be returned."""def _fcn_wrapper(*args, **kwargs):q = mp.Queue()p = mp.Process(target=lambda q: q.put(fcn(*args, **kwargs)), args=(q,))p.start()p.join(timeout=timeout)exit_code = p.exitcodeif exit_code == 0:return q.get()logging.warning('Process did not exit correctly. Exit code: {}'.format(exit_code))return default_valuereturn _fcn_wrapper
So you would use it like:
@sigsev_guard(default_value=-1, timeout=60)
def your_risky_function(a,b,c,d):...