admin管理员组

文章数量:1241085

numpy.vectorize conveniently converts a scalar function to vectorized functions that can be applied directly to arrays. However, when inputting a single value into the vectorized function, the output is a 0-dimentional array instead of the corresponding value type, which can cause errors when using the result elsewhere due to typing issues. My question is: is there a mechanism in numpy that can resolve this problem by automatically convert the 0-dimensional array return value to the corresponding data type?

For explanation I'd give an example:

@np.vectorize ( excluded = ( 1, 2 ) )
def rescale ( 
    value: float, 
    srcRange: tuple [ float, float ], 
    dstRange: tuple [ float, float ] = ( 0, 1 ), 
) -> float:
    srcMin, srcMax = srcRange
    dstMin, dstMax = dstRange
    t = ( value - srcMin ) / ( srcMax - srcMin )
    return dstMin + t * ( dstMax - dstMin )

When calling the function above with rescale ( 5, ( 0, 10 ) ) the return value is numpy.array(0.5) instead of just the value 0.5.

Currently I resolve this problem by a self-defined decorator:

def vectorize0dFix ( func ):
    def _func ( *args, **kwargs ):
        result = func ( *args, **kwargs )
        if isinstance ( result, np.ndarray ) and result.shape == ( ):
            return result.item ( )
        else:
            return result
    return _func

But if this problem do causes trouble there should be a mechanism in numpy which properly deals with the problem. I wonder whether there is one or why there isn't.

numpy.vectorize conveniently converts a scalar function to vectorized functions that can be applied directly to arrays. However, when inputting a single value into the vectorized function, the output is a 0-dimentional array instead of the corresponding value type, which can cause errors when using the result elsewhere due to typing issues. My question is: is there a mechanism in numpy that can resolve this problem by automatically convert the 0-dimensional array return value to the corresponding data type?

For explanation I'd give an example:

@np.vectorize ( excluded = ( 1, 2 ) )
def rescale ( 
    value: float, 
    srcRange: tuple [ float, float ], 
    dstRange: tuple [ float, float ] = ( 0, 1 ), 
) -> float:
    srcMin, srcMax = srcRange
    dstMin, dstMax = dstRange
    t = ( value - srcMin ) / ( srcMax - srcMin )
    return dstMin + t * ( dstMax - dstMin )

When calling the function above with rescale ( 5, ( 0, 10 ) ) the return value is numpy.array(0.5) instead of just the value 0.5.

Currently I resolve this problem by a self-defined decorator:

def vectorize0dFix ( func ):
    def _func ( *args, **kwargs ):
        result = func ( *args, **kwargs )
        if isinstance ( result, np.ndarray ) and result.shape == ( ):
            return result.item ( )
        else:
            return result
    return _func

But if this problem do causes trouble there should be a mechanism in numpy which properly deals with the problem. I wonder whether there is one or why there isn't.

Share Improve this question asked 2 days ago F. X. P.F. X. P. 333 bronze badges 2
  • 'why' questions are generally unanswerable. None of us are original developers, and few are current developers. So the best we can do is deduce reasons from patterns. – hpaulj Commented 2 days ago
  • If you try to include vectorize in production code (not just experimental things), you should try to find and understand its code. Currently the [source] link of its __call__ method docs is the most direct link. github/numpy/numpy/blob/v2.2.0/numpy/lib/… – hpaulj Commented 2 days ago
Add a comment  | 

1 Answer 1

Reset to default 3

Short answer:

  • You can unwrap 0-d results into scalars while keeping n-d results (n>0) by indexing with an empty tuple ().
  • Better yet, I would try to avoid using @np.vectorize altogether – in general, but in particular with your given example where vectorization is not necessary.

Long answer:

Following these answers to related questions, by indexing with an empty tuple (), you can systematically unwrap 0-d arrays into scalars while keeping other arrays.

So, using the @np.vectorized function rescale() from your question, you can post-process your results accordingly, for example:

with_scalar_input = rescale(5, (0, 10))[()]
with_vector_input = rescale([5], (0, 10))[()]
print(type(with_scalar_input))  # <class 'numpy.float64'>
print(type(with_vector_input))  # <class 'numpy.ndarray'>

I am not aware of any built-in NumPy mechanism that solves this edge case of @np.vectorize for you, so providing your own decorator is probably a viable way to go.

Custom scalar-unwrapping @vectorize decorator

Writing your own custom decorator that (a) accepts all arguments of and behaves exactly like @np.vectorize, but (b) appends the scalar unwrapping step, could look as follows:

from functools import wraps
import numpy as np

def vectorize(*wa, **wkw):
    def decorator(f):
        @wraps(f)
        def wrap(*fa, **fkw): return np.vectorize(f, *wa, **wkw)(*fa, **fkw)[()]
        return wrap
    return decorator

@vectorize(excluded=(1, 2))
def rescale(value, srcRange, dstRange=(0, 1)):
    srcMin, srcMax = srcRange
    dstMin, dstMax = dstRange
    t = (value - srcMin) / (srcMax - srcMin)
    return dstMin + t * (dstMax - dstMin)

with_scalar_input = rescale(5, (0, 10))
with_vector_input = rescale([5], (0, 10))
print(type(with_scalar_input))  # <class 'numpy.float64'>
print(type(with_vector_input))  # <class 'numpy.ndarray'>

If you don't care about docstring propagation (of which @functools.wraps takes care), the @vectorize decorator can be shortened to:

import numpy as np

vectorize = lambda *wa, **wkw: lambda f: lambda *fa, **fkw: \
            np.vectorize(f, *wa, **wkw)(*fa, **fkw)[()]

@vectorize(excluded=(1, 2))
def rescale(value, srcRange, dstRange=(0, 1)):
    srcMin, srcMax = srcRange
    dstMin, dstMax = dstRange
    t = (value - srcMin) / (srcMax - srcMin)
    return dstMin + t * (dstMax - dstMin)

with_scalar_input = rescale(5, (0, 10))
with_vector_input = rescale([5], (0, 10))
print(type(with_scalar_input))  # <class 'numpy.float64'>
print(type(with_vector_input))  # <class 'numpy.ndarray'>

Caution: All approaches using (), as proposed above, produce a new edge case: if the input is provided as a 0-d NumPy array, such as np.array(5), the result will also be unwrapped into a scalar. Likewise, you might have noticed that the scalar results are NumPy scalars, <class 'numpy.float64'>, rather than native Python scalars, <class 'float'>. If either of this is not acceptable for you, then more elaborate type checking or post-processing will be necessary.

Try to avoid @np.vectorize altogether

As a final note: Maybe try to avoid using @np.vectorize altogether in the first place, and try to write your code such that it works both with NumPy arrays and scalars.

As to avoiding @np.vectorize: Its documentation states:

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

As to adjusting your code accordingly: Your given function rescale() is a good example for writing code that works both with NumPy arrays and scalars correctly; in fact, it does so already, without any adjustments! You just have to ensure that vector-valued input is given as a NumPy array (rather than, say, a plain Python list or tuple):

import numpy as np

def rescale(value, srcRange, dstRange=(0, 1)):
    srcMin, srcMax = srcRange
    dstMin, dstMax = dstRange
    t = (value - srcMin) / (srcMax - srcMin)
    return dstMin + t * (dstMax - dstMin)

with_scalar_input = rescale(5, (0, 10))
with_vector_input = rescale(np.asarray([5]), (0, 10))
print(type(with_scalar_input))  # <class 'float'>
print(type(with_vector_input))  # <class 'numpy.ndarray'>

Moreover, while producing exactly the same output for vector-type input¹, the @np.vectorized version is orders of magnitude slower:

import numpy as np
from timeit import Timer

def rescale(value, srcRange, dstRange=(0, 1)):
    srcMin, srcMax = srcRange
    dstMin, dstMax = dstRange
    t = (value - srcMin) / (srcMax - srcMin)
    return dstMin + t * (dstMax - dstMin)

vectorized = np.vectorize(rescale, excluded=(1, 2))

a = np.random.normal(size=10000)
assert (rescale(a, (0, 10)) == vectorized(a, (0, 10))).all()  # Same result?
print("Unvectorized:", Timer(lambda: rescale(a, (0, 10))).timeit(100))
print("Vectorized:", Timer(lambda: vectorized(a, (0, 10))).timeit(100))

On my machine, this produces about 0.003 seconds for the unvectorized version and about 0.8 seconds for the vectorized version.

In other words: we have more than a 250× speedup with the given, unvectorized function for a given 10,000-element array, while (if used carefully, i.e. by providing NumPy arrays rather than plain Python sequences for vector-type inputs) the function already produces scalar outputs for scalar inputs and vector outputs for vector inputs!

I guess the code above might not be the code that you are actually trying to vectorize; but anyway: in a lot of cases, a similar approach is possible.

¹) Again, the case of a 0-d vector input is special here, but you might want to check that for yourself.

本文标签: python0dimensional array problems with numpyvectorizeStack Overflow