RestrictedPython: Call other functions within user-specified code?

2024/9/28 1:23:30

Using Yuri Nudelman's code with the custom _import definition to specify modules to restrict serves as a good base but when calling functions within said user_code naturally due to having to whitelist everything is there any way to permit other user defined functions to be called? Open to other sandboxing solutions although Jupyter didn't seem straight-forward to embed within a web interface.

from RestrictedPython import safe_builtins, compile_restricted
from RestrictedPython.Eval import default_guarded_getitemdef _import(name, globals=None, locals=None, fromlist=(), level=0):safe_modules = ["math"]if name in safe_modules:globals[name] = __import__(name, globals, locals, fromlist, level)else:raise Exception("Don't you even think about it {0}".format(name))safe_builtins['__import__'] = _import # Must be a part of builtinsdef execute_user_code(user_code, user_func, *args, **kwargs):""" Executed user code in restricted envArgs:user_code(str) - String containing the unsafe codeuser_func(str) - Function inside user_code to execute and return value*args, **kwargs - arguments passed to the user functionReturn:Return value of the user_func"""def _apply(f, *a, **kw):return f(*a, **kw)try:# This is the variables we allow user code to see. @result will contain return value.restricted_locals = { "result": None,"args": args,"kwargs": kwargs,}   # If you want the user to be able to use some of your functions inside his code,# you should add this function to this dictionary.# By default many standard actions are disabled. Here I add _apply_ to be able to access# args and kwargs and _getitem_ to be able to use arrays. Just think before you add# something else. I am not saying you shouldn't do it. You should understand what you# are doing thats all.restricted_globals = { "__builtins__": safe_builtins,"_getitem_": default_guarded_getitem,"_apply_": _apply,}   # Add another line to user code that executes @user_funcuser_code += "\nresult = {0}(*args, **kwargs)".format(user_func)# Compile the user codebyte_code = compile_restricted(user_code, filename="<user_code>", mode="exec")# Run itexec(byte_code, restricted_globals, restricted_locals)# User code has modified result inside restricted_locals. Return it.return restricted_locals["result"]except SyntaxError as e:# Do whaever you want if the user has code that does not compileraiseexcept Exception as e:# The code did something that is not allowed. Add some nasty punishment to the user here.raisei_example = """
import mathdef foo():return 7def myceil(x):return math.ceil(x)+foo()
"""
print(execute_user_code(i_example, "myceil", 1.5))

Running this returns 'foo' is not defined

Answer

First of all, the replacement for the __import__ built-in is implemented incorrectly. That built-in is supposed to return the imported module, not mutate the globals to include it:

Python 3.9.12 (main, Mar 24 2022, 13:02:21)
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> __import__('math')
<module 'math' (built-in)>
>>> math
Traceback (most recent call last):File "<stdin>", line 1, in <module>
NameError: name 'math' is not defined

A better way to reimplement __import__ would be this:

_SAFE_MODULES = frozenset(("math",))def _safe_import(name, *args, **kwargs):if name not in _SAFE_MODULES:raise Exception(f"Don't you even think about {name!r}")return __import__(name, *args, **kwargs)

The fact that you mutated globals in your original implementation was partially masking the primary bug. Namely: name assignments within restricted code (function definitions, variable assignments and imports) mutate the locals dict, but name look-ups are by default done as global look-ups, bypassing the locals entirely. You can see this by disassembling the restricted bytecode using __import__('dis').dis(byte_code):

  2           0 LOAD_CONST               0 (0)2 LOAD_CONST               1 (None)4 IMPORT_NAME              0 (math)6 STORE_NAME               0 (math)4           8 LOAD_CONST               2 (<code object foo at 0x7fbef4eef3a0, file "<user_code>", line 4>)10 LOAD_CONST               3 ('foo')12 MAKE_FUNCTION            014 STORE_NAME               1 (foo)7          16 LOAD_CONST               4 (<code object myceil at 0x7fbef4eef660, file "<user_code>", line 7>)18 LOAD_CONST               5 ('myceil')20 MAKE_FUNCTION            022 STORE_NAME               2 (myceil)24 LOAD_CONST               1 (None)26 RETURN_VALUEDisassembly of <code object foo at 0x7fbef4eef3a0, file "<user_code>", line 4>:5           0 LOAD_CONST               1 (7)2 RETURN_VALUEDisassembly of <code object myceil at 0x7fbef4eef660, file "<user_code>", line 7>:8           0 LOAD_GLOBAL              0 (_getattr_)2 LOAD_GLOBAL              1 (math)4 LOAD_CONST               1 ('ceil')6 CALL_FUNCTION            28 LOAD_FAST                0 (x)10 CALL_FUNCTION            112 LOAD_GLOBAL              2 (foo)14 CALL_FUNCTION            016 BINARY_ADD18 RETURN_VALUE

The documentation for exec explains (emphasis mine):

If only globals is provided, it must be a dictionary (and not a subclass of dictionary), which will be used for both the global and the local variables. If globals and locals are given, they are used for the global and local variables, respectively. If provided, locals can be any mapping object. Remember that at the module level, globals and locals are the same dictionary. If exec gets two separate objects as globals and locals, the code will be executed as if it were embedded in a class definition.

This makes separate mappings for locals and globals completely spurious. We can therefore simply get rid of the locals dict, and put everything in globals. The entire code should look something like this:

from RestrictedPython import safe_builtins, compile_restricted_SAFE_MODULES = frozenset(("math",))def _safe_import(name, *args, **kwargs):if name not in _SAFE_MODULES:raise Exception(f"Don't you even think about {name!r}")return __import__(name, *args, **kwargs)def execute_user_code(user_code, user_func, *args, **kwargs):my_globals = {"__builtins__": {**safe_builtins,"__import__": _safe_import,},}try:byte_code = compile_restricted(user_code, filename="<user_code>", mode="exec")except SyntaxError:# syntax error in the sandboxed coderaisetry:exec(byte_code, my_globals)return my_globals[user_func](*args, **kwargs)except BaseException:# runtime error (probably) in the sandboxed coderaise

Above I also managed to fix a couple of tangential issues:

  • Instead of injecting the function call into the compiled snippet, I look up the function in the globals dict directly. This avoids a potential code injection vector if user_func happens to come from an untrusted source, and avoids having to inject args, kwargs and result into the sandbox, which would enable sandboxed code to clobber it.
  • I avoid mutating the safe_builtins object provided by the RestrictedPython module. Otherwise, if any other code within your program happens to be using RestrictedPython, it might have been affected.
  • I split the exception handling between the two steps: compilation and execution. This minimises the probability that bugs in the sandbox code will be misattributed to the sandboxed code.
  • I changed the caught runtime exception type to BaseException, to also catch cases when sandboxed code attempts to raise KeyboardInterrupt or SystemExit (which do not derive from Exception, but only BaseException).
  • I also removed references to _getitem_ and _apply_, which don’t seem to be used for anything. If they turn out to be necessary after all, you may restore them.

(Note, however, that this still does not protect against DoS via infinite loops within the sandbox.)

https://en.xdnf.cn/q/71400.html

Related Q&A

TypeError: object of type numpy.int64 has no len()

I am making a DataLoader from DataSet in PyTorch. Start from loading the DataFrame with all dtype as an np.float64result = pd.read_csv(dummy.csv, header=0, dtype=DTYPE_CLEANED_DF)Here is my dataset cla…

VS Code Pylance not highlighting variables and modules

Im using VS Code with the Python and Pylance extensions. Im having a problem with the Pylance extension not doing syntax highlight for things like modules and my dataframe. I would expect the modules…

How to compute Spearman correlation in Tensorflow

ProblemI need to compute the Pearson and Spearman correlations, and use it as metrics in tensorflow.For Pearson, its trivial :tf.contrib.metrics.streaming_pearson_correlation(y_pred, y_true)But for Spe…

Pytorch loss is nan

Im trying to write my first neural network with pytorch. Unfortunately, I encounter a problem when I want to get the loss. The following error message: RuntimeError: Function LogSoftmaxBackward0 return…

How do you debug python code with kubernetes and skaffold?

I am currently running a django app under python3 through kubernetes by going through skaffold dev. I have hot reload working with the Python source code. Is it currently possible to do interactive deb…

Discrepancies between R optim vs Scipy optimize: Nelder-Mead

I wrote a script that I believe should produce the same results in Python and R, but they are producing very different answers. Each attempts to fit a model to simulated data by minimizing deviance usi…

C++ class not recognized by Python 3 as a module via Boost.Python Embedding

The following example from Boost.Python v1.56 shows how to embed the Python 3.4.2 interpreter into your own application. Unfortunately that example does not work out of the box on my configuration with…

Python NET call C# method which has a return value and an out parameter

Im having the following static C# methodpublic static bool TryParse (string s, out double result)which I would like to call from Python using the Python NET package.import clr from System import Double…

ValueError: Length of passed values is 7, index implies 0

I am trying to get 1minute open, high, low, close, volume values from bitmex using ccxt. everything seems to be fine however im not sure how to fix this error. I know that the index is 7 because there …

What is pythons strategy to manage allocation/freeing of large variables?

As a follow-up to this question, it appears that there are different allocation/deallocation strategies for little and big variables in (C)Python. More precisely, there seems to be a boundary in the ob…