admin管理员组

文章数量:1279044

I have a curious case. Unfortunately, I did not manage to reproduce the computation times in the MWE below but my project is basically a more crowded version. I wanted to speedup my code so I tried out differential_evolution with worker=-1. This did not work because of pickling issues (AttributeError: Can't pickle local object); I don't really want to define my cost-function at the top level of my module. So I used worker=Pool(ncpu).map instead. In my local MWE this works, but when migrating this change to my project I observe the following.

I observe that using two different approaches I find that differential_evolution takes more then 10 times as much time to finish in the second (class-based) approach then the first approach. When I look into my CPU load for approach 1 I see that all cores are used. However in approach 2 basically only one core is being used. When I look at the output via disp=True I also see that the func evaluations are printed way slower as well in the 2nd approach compared to the 1st.

Any idea what that might be? Tx.

from os import cpu_count
from time import time
import numpy as np
from pathos.multiprocessing import ProcessingPool as Pool
from scipy.optimize import differential_evolution
from scipy.integrate import quad
from scipy.interpolate import CubicSpline

atol=1e-13
rtol=1e-10
T = 10
bounds = [(-T,T)]

# --- Approach 1 --- #

x = np.linspace(0,10*2*np.pi,100)
y = np.cos(x)
cs = CubicSpline(x, y)
def f(t):
    return cs(t)**2

ctime = time()
def cost(t):
    t = t[0]
    q = 0
    for _ in range(50):
        q += quad(lambda s: f(s*t), -T, T)[0]
    return q
result = differential_evolution(cost, bounds, atol=atol, tol=rtol, updating='deferred', workers=Pool(cpu_count()).map)
print("Approach 1:", time()-ctime)

# --- Approach 2 --- #

class A():
    def define_f(self):
        x = np.linspace(0,10*2*np.pi,100)
        y = np.cos(x)
        cs = CubicSpline(x, y)
        def f(t):
            return cs(t)**2
        return f

f = A().define_f()

class B():
    def solve(self, f):
        def cost(t):
            t = t[0]
            q = 0
            for _ in range(50):
                q += quad(lambda s: f(s*t), -T, T)[0]
            return q
        
        ctime = time()
        result = differential_evolution(cost, bounds, atol=atol, tol=rtol, updating='deferred', workers=Pool(cpu_count()).map)
        print("Approach 2:", time()-ctime)

B().solve(f=f)

Produces something like:

Approach 1: 4.710453510284424
Approach 2: 5.638171672821045

However in my project the second timestamp is an order of magnitude larger.

Below is a vectorized version of the code I'm actually using without the provided pool.map: y0,y1 are solutions obtained from solving the variational equation of a 2-dimension nonlinear system for periodic solutions via scipy.integrat.solve_ivp. The first step in cost of reshaping the t array is necessary since scipy's solution object requires the shape to be (n,).

# Define fundamental solution.
def W(t: np.ndarray) -> np.ndarray:
    if type(t) is np.ndarray: return np.stack([y0.sol(t).T,  y1.sol(t).T], axis=-1).swapaxes(1,2)
    else: return np.array([y0.sol(t),  y1.sol(t)])

# Define monodromy matrix and related quantities.
C = W(period)
C1 = np.eye(C.shape[0]) - C
C2 = np.linalg.solve(C, C1)

# Define cost function.
def cost(t):
    if len(t.shape) > 1:
        t = np.reshape(t, (t.shape[1],))
    B = W(t)
    def H(s: float, t: np.ndarray=t, B: np.ndarray=B):
        mask_s_le_t = s <= t
        W_s = W(s)
        X0 = np.linalg.solve(C1 @ W_s, B)
        X1 = np.linalg.solve(C2 @ W_s, B)
        return np.where(mask_s_le_t[:, None, None], X0, X1)
    return -quad_vec(lambda s: np.sum(H(s)**2, axis=(1,2)), 0, period)[0]

# Maximize.
maxi = -minimize(cost, bounds=[(0,period)], atol=atol, tol=rtol, updating='deferred', vectorized=1).fun

I have a curious case. Unfortunately, I did not manage to reproduce the computation times in the MWE below but my project is basically a more crowded version. I wanted to speedup my code so I tried out differential_evolution with worker=-1. This did not work because of pickling issues (AttributeError: Can't pickle local object); I don't really want to define my cost-function at the top level of my module. So I used worker=Pool(ncpu).map instead. In my local MWE this works, but when migrating this change to my project I observe the following.

I observe that using two different approaches I find that differential_evolution takes more then 10 times as much time to finish in the second (class-based) approach then the first approach. When I look into my CPU load for approach 1 I see that all cores are used. However in approach 2 basically only one core is being used. When I look at the output via disp=True I also see that the func evaluations are printed way slower as well in the 2nd approach compared to the 1st.

Any idea what that might be? Tx.

from os import cpu_count
from time import time
import numpy as np
from pathos.multiprocessing import ProcessingPool as Pool
from scipy.optimize import differential_evolution
from scipy.integrate import quad
from scipy.interpolate import CubicSpline

atol=1e-13
rtol=1e-10
T = 10
bounds = [(-T,T)]

# --- Approach 1 --- #

x = np.linspace(0,10*2*np.pi,100)
y = np.cos(x)
cs = CubicSpline(x, y)
def f(t):
    return cs(t)**2

ctime = time()
def cost(t):
    t = t[0]
    q = 0
    for _ in range(50):
        q += quad(lambda s: f(s*t), -T, T)[0]
    return q
result = differential_evolution(cost, bounds, atol=atol, tol=rtol, updating='deferred', workers=Pool(cpu_count()).map)
print("Approach 1:", time()-ctime)

# --- Approach 2 --- #

class A():
    def define_f(self):
        x = np.linspace(0,10*2*np.pi,100)
        y = np.cos(x)
        cs = CubicSpline(x, y)
        def f(t):
            return cs(t)**2
        return f

f = A().define_f()

class B():
    def solve(self, f):
        def cost(t):
            t = t[0]
            q = 0
            for _ in range(50):
                q += quad(lambda s: f(s*t), -T, T)[0]
            return q
        
        ctime = time()
        result = differential_evolution(cost, bounds, atol=atol, tol=rtol, updating='deferred', workers=Pool(cpu_count()).map)
        print("Approach 2:", time()-ctime)

B().solve(f=f)

Produces something like:

Approach 1: 4.710453510284424
Approach 2: 5.638171672821045

However in my project the second timestamp is an order of magnitude larger.

Below is a vectorized version of the code I'm actually using without the provided pool.map: y0,y1 are solutions obtained from solving the variational equation of a 2-dimension nonlinear system for periodic solutions via scipy.integrat.solve_ivp. The first step in cost of reshaping the t array is necessary since scipy's solution object requires the shape to be (n,).

# Define fundamental solution.
def W(t: np.ndarray) -> np.ndarray:
    if type(t) is np.ndarray: return np.stack([y0.sol(t).T,  y1.sol(t).T], axis=-1).swapaxes(1,2)
    else: return np.array([y0.sol(t),  y1.sol(t)])

# Define monodromy matrix and related quantities.
C = W(period)
C1 = np.eye(C.shape[0]) - C
C2 = np.linalg.solve(C, C1)

# Define cost function.
def cost(t):
    if len(t.shape) > 1:
        t = np.reshape(t, (t.shape[1],))
    B = W(t)
    def H(s: float, t: np.ndarray=t, B: np.ndarray=B):
        mask_s_le_t = s <= t
        W_s = W(s)
        X0 = np.linalg.solve(C1 @ W_s, B)
        X1 = np.linalg.solve(C2 @ W_s, B)
        return np.where(mask_s_le_t[:, None, None], X0, X1)
    return -quad_vec(lambda s: np.sum(H(s)**2, axis=(1,2)), 0, period)[0]

# Maximize.
maxi = -minimize(cost, bounds=[(0,period)], atol=atol, tol=rtol, updating='deferred', vectorized=1).fun
Share Improve this question edited Feb 25 at 13:15 Hannes asked Feb 24 at 18:22 HannesHannes 751 silver badge6 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 2

When considering using workers to parallelise differential_evolution you should take into account the overhead incurred with new Processes. It takes time to create those processes, and it takes time to distribute and gather the calculations. Only if the objective function is expensive and the population is large will parallelisation be worth it. With this example it's not worth it. The computation is cheap and there is only one parameter, meaning the population is small. Frequently vectorisation of the objective function is a better option, see the vectorized keyword of the documentation.

The default option in differential_evolution (multiprocessing.Pool) is only available if the objective function is pickleable. In these two examples it is not, which is why you have to use pathos. I don't know why example 1 or 2 is better in that case, that's a pathos question.

I have a curious case. Unfortunately, I did not manage to reproduce the computation times in the MWE below but my project is basically a more crowded version. I wanted to speedup my code so I tried out differential_evolution with worker=-1. This did not work because of pickling issues (AttributeError: Can't pickle local object); I don't really want to define my cost-function at the top level of my module. So I used worker=Pool(ncpu).map instead.

I have personally found that the multiprocessing support built into the standard library works better than pathos. I would suggest solving this problem by making cost() picklable.

You mention that you don't want to define your cost function at the module top level, but this is not the only way to approach the problem. If you have state that you want to pass along within a function that you want to pickle, you have three options for how to solve this: functools.partial(), SciPy args, and defining methods on a class. (Technically, you can also do this by defining the variable as a global, as you do in Approach #1, but as this is not portable, I don't include it.)

Of these three approaches, defining methods on a class can make this picklable without defining your cost function at top level.

Example:

from os import cpu_count
from time import time
import numpy as np
from multiprocessing import Pool
from scipy.optimize import differential_evolution
from scipy.integrate import quad
from scipy.interpolate import CubicSpline

atol=1e-13
rtol=1e-10
T = 10
bounds = [(-T,T)]


# --- Approach 3 --- #

class A():
    def f(self, t):
        return self.cs(t)**2
    def define_f(self):
        x = np.linspace(0,10*2*np.pi,100)
        y = np.cos(x)
        cs = CubicSpline(x, y)
        self.cs = cs
        return self.f


class B():
    def cost(self, t):
        t = t[0]
        q = 0
        for _ in range(50):
            q += quad(lambda s: self.f(s*t), -T, T)[0]
        return q
    def solve(self, f):
        ctime = time()
        self.f = f
        with Pool(cpu_count()) as pool:
            result = differential_evolution(self.cost, bounds, atol=atol, tol=rtol, updating='deferred', workers=pool.map)
        print("Approach 3:", time()-ctime)

if __name__ == '__main__':
    f = A().define_f()
    B().solve(f=f)

On my system, this is actually faster than approach 1:

Approach 1: 11.039157629013062
Approach 2: 11.278709650039673
Approach 3: 10.805178165435791

Note that workers=multiprocessing.Pool(cpu_count()) is roughly equivalent to workers=-1, since differential_evolution() uses the built in multiprocessing support by default.

本文标签: pythonscipy39s differentialevolution appears slower when used inside classStack Overflow