Why is C++ much faster than python with boost?

2024/10/14 0:29:38

My goal is to write a small library for spectral finite elements in Python and to that purpose I tried extending python with a C++ library using Boost, with the hope that it would make my code faster.

class Quad {public:Quad(int, int);double integrate(boost::function<double(std::vector<double> const&)> const&);double integrate_wrapper(boost::python::object const&);std::vector< std::vector<double> > nodes;std::vector<double> weights;
};...namespace std {typedef std::vector< std::vector< std::vector<double> > > cube;typedef std::vector< std::vector<double> > mat;typedef std::vector<double> vec;
}...double Quad::integrate(boost::function<double(vec const&)> const& func) {double result = 0.;for (unsigned int i = 0; i < nodes.size(); ++i) {result += func(nodes[i]) * weights[i];}return result;
}// ---- PYTHON WRAPPER ----
double Quad::integrate_wrapper(boost::python::object const& func) {std::function<double(vec const&)> lambda;switch (this->nodes[0].size()) {case 1: lambda = [&func](vec const& v) -> double { return boost::python::extract<double>(func (v[0])); }; break;case 2: lambda = [&func](vec const& v) -> double { return boost::python::extract<double>(func(v[0], v[1])); }; break;case 3: lambda = [&func](vec const& v) -> double { return boost::python::extract<double>(func(v[0], v[1], v[2])); }; break;default: cout << "Dimension must be 1, 2, or 3" << endl; exit(0);}return integrate(lambda);
}// ---- EXPOSE TO PYTHON ----
BOOST_PYTHON_MODULE(hermite)
{using namespace boost::python;class_<std::vec>("double_vector").def(vector_indexing_suite<std::vec>());class_<std::mat>("double_mat").def(vector_indexing_suite<std::mat>());class_<Quad>("Quad", init<int,int>()).def("integrate", &Quad::integrate_wrapper).def_readonly("nodes", &Quad::nodes).def_readonly("weights", &Quad::weights);
}

I compared the performance of three different methods to calculate the integral of two functions. The two functions are:

  • The function f1(x,y,z) = x*x
  • A function that is more difficult to evaluate: f2(x,y,z) = np.cos(2*x+2*y+2*z) + x*y + np.exp(-z*z) +np.cos(2*x+2*y+2*z) + x*y + np.exp(-z*z) +np.cos(2*x+2*y+2*z) + x*y + np.exp(-z*z) +np.cos(2*x+2*y+2*z) + x*y + np.exp(-z*z) +np.cos(2*x+2*y+2*z) + x*y + np.exp(-z*z)

The methods used are:

  1. Call the library from a C++ program:

    double func(vector<double> v) {return F1_OR_F2;
    }int main() {hermite::Quad quadrature(100, 3);double result = quadrature.integrate(func);cout << "Result = " << result << endl;
    }
    
  2. Call the library from a Python script:

    import hermite
    def function(x, y, z): return F1_OR_F2
    my_quad = hermite.Quad(100, 3)
    result = my_quad.integrate(function)
    
  3. Use a for loop in Python:

    import hermite
    def function(x, y, z): return F1_OR_F2
    my_quad = hermite.Quad(100, 3)
    weights = my_quad.weights
    nodes = my_quad.nodes
    result = 0.
    for i in range(len(weights)):result += weights[i] * function(nodes[i][0], nodes[i][1], nodes[i][2])
    

Here are the execution times of each of the method (The time was measured using the time command for method 1, and the python module time for methods 2 and 3, and the C++ code was compiled using Cmake and set (CMAKE_BUILD_TYPE Release))

  • For f1:

    • Method 1: 0.07s user 0.01s system 99% cpu 0.083 total
    • Method 2: 0.19s
    • Method 3: 3.06s
  • For f2:

    • Method 1: 0.28s user 0.01s system 99% cpu 0.289 total
    • Method 2: 12.47s
    • Method 3: 16.31s

Based on these results, my questions are the following:

  • Why is the first method so much faster than the second?

  • Could the python wrapper be improved to reach comparable performance between methods 1 and 2?

  • Why is method 2 more sensitive than method 3 to the difficulty of the function to integrate?


EDIT: I also tried to define a function that accepts a string as argument, writes it to a file, and proceeds to compile the file and dynamically load the resulting .so file:

double Quad::integrate_from_string(string const& function_body) {// Write function to fileofstream helper_file;helper_file.open("/tmp/helper_function.cpp");helper_file << "#include <vector>\n#include <cmath>\n";helper_file << "extern \"C\" double toIntegrate(std::vector<double> v) {\n";helper_file << "    return " << function_body << ";\n}";helper_file.close();// Compile filesystem("c++ /tmp/helper_function.cpp -o /tmp/helper_function.so -shared -fPIC");// Load function dynamicallytypedef double (*vec_func)(vec);void *function_so = dlopen("/tmp/helper_function.so", RTLD_NOW);vec_func func = (vec_func) dlsym(function_so, "toIntegrate");double result = integrate(func);dlclose(function_so);return result;
}

It's quite dirty and probably not very portable, so I'd be happy to find a better solution, but it works well and plays nicely with the ccode function of sympy.


SECOND EDIT I have rewritten the function in pure Python Using Numpy.

import numpy as np
import numpy.polynomial.hermite_e as herm
import time
def integrate(function, degrees):dim = len(degrees)nodes_multidim = []weights_multidim = []for i in range(dim):nodes_1d, weights_1d = herm.hermegauss(degrees[i])nodes_multidim.append(nodes_1d)weights_multidim.append(weights_1d)grid_nodes = np.meshgrid(*nodes_multidim)grid_weights = np.meshgrid(*weights_multidim)nodes_flattened = []weights_flattened = []for i in range(dim):nodes_flattened.append(grid_nodes[i].flatten())weights_flattened.append(grid_weights[i].flatten())nodes = np.vstack(nodes_flattened)weights = np.prod(np.vstack(weights_flattened), axis=0)return np.dot(function(nodes), weights)def function(v): return F1_OR_F2
result = integrate(function, [100,100,100])
print("-> Result = " + str(result) + ", Time = " + str(end-start))

Somewhat surprisingly (at least to me), there is no significant difference in performance between this method and the pure C++ implementation. In particular, it takes 0.059s for f1 and 0.36s for f2.

Answer

Your functions take vectors by value, which involves copying the vector. integrate_wrapper does extra copies.

It also makes sense to accept boost::function by reference and capture func by reference in those lambdas.

Change these to (note the & and const& bits):

double integrate(boost::function<double(std::vector<double> const&)> const&);double Quad::integrate_wrapper(boost::python::object func) {std::function<double(vec const&)> lambda;switch (this->nodes[0].size()) {case 1: lambda = [&func](vec const& v) -> double { return boost::python::extract<double>(func (v[0])); }; break;case 2: lambda = [&func](vec const& v) -> double { return boost::python::extract<double>(func(v[0], v[1])); }; break;case 3: lambda = [&func](vec const& v) -> double { return boost::python::extract<double>(func(v[0], v[1], v[2])); }; break;default: cout << "Dimension must be 1, 2, or 3" << endl; exit(0);}return integrate(lambda);
}

Still though, calling a Python function from C++ is more expensive than calling a C++ function.


People normally use numpy for fast linear algebra in Python, it uses SIMD for many common operations. You should probably consider using numpy first before rolling out a C++ implementation. In C++ you would have to use Intel MKL on Eigen to vectorize.

https://en.xdnf.cn/q/69473.html

Related Q&A

pandas: How to get .to_string() method to align column headers with column values?

This has been stumping me for a while and I feel like there has to be a solution since printing a dataframe always aligns the columns headers with their respective values.example:df = pd.DataFrame({Fir…

Do I need to use `nogil` in Cython

I have some Cython code that Id like to run as quickly as possible. Do I need to release the GIL in order to do this? Lets suppose my code is similar to this: import numpy as np# trivial definition ju…

supervisord environment variables setting up application

Im running an application from supervisord and I have to set up an environment for it. There are about 30 environment variables that need to be set. Ive tried putting all on one bigenvironment=line and…

Updating gui items withing the process

I am trying to make a GUI for my app and ran into a problem: using PySimpleGUI I have to define layout at first and only then display the whole window. Right now the code is like this:import PySimpleGU…

UnicodeDecodeError with Djangos request.FILES

I have the following code in the view call..def view(request):body = u"" for filename, f in request.FILES.items():body = body + Filename: + filename + \n + f.read() + \nOn some cases I getU…

bifurcation diagram with python

Im a beginner and I dont speak english very well so sorry about that. Id like to draw the bifurcation diagram of the sequence : x(n+1)=ux(n)(1-x(n)) with x(0)=0.7 and u between 0.7 and 4.I am supposed …

How do I use nordvpn servers as python requests proxies

Dont ask how, but I parsed the server endpoints of over 5000 nordvpn servers. They usually are something like ar15.nordvpn.com for example. Im trying to use nordvpn servers as request proxies. I know i…

Python Proxy Settings

I was using the wikipedia module in which you can get the information that is present about that topic on wikipedia. When I run the code it is unable to connect because of proxy. When I connected PC to…

Docker - Run Container from Inside Container

I have two applications:a Python console script that does a short(ish) task and exits a Flask "frontend" for starting the console app by passing it command line argumentsCurrently, the Flask …

Way to run Maven from Python script?

(I am using Windows.)I am trying to run maven from a python script. I have this:import subprocessmvn="C:\\_home\\apache-maven-2.2.1\\bin\\mvn.bat --version" p = subprocess.Popen(mvn, shell=Tr…