admin管理员组文章数量:1241085
numpy.vectorize
conveniently converts a scalar function to vectorized functions that can be applied directly to arrays. However, when inputting a single value into the vectorized function, the output is a 0-dimentional array instead of the corresponding value type, which can cause errors when using the result elsewhere due to typing issues. My question is: is there a mechanism in numpy
that can resolve this problem by automatically convert the 0-dimensional array return value to the corresponding data type?
For explanation I'd give an example:
@np.vectorize ( excluded = ( 1, 2 ) )
def rescale (
value: float,
srcRange: tuple [ float, float ],
dstRange: tuple [ float, float ] = ( 0, 1 ),
) -> float:
srcMin, srcMax = srcRange
dstMin, dstMax = dstRange
t = ( value - srcMin ) / ( srcMax - srcMin )
return dstMin + t * ( dstMax - dstMin )
When calling the function above with rescale ( 5, ( 0, 10 ) )
the return value is numpy.array(0.5)
instead of just the value 0.5
.
Currently I resolve this problem by a self-defined decorator:
def vectorize0dFix ( func ):
def _func ( *args, **kwargs ):
result = func ( *args, **kwargs )
if isinstance ( result, np.ndarray ) and result.shape == ( ):
return result.item ( )
else:
return result
return _func
But if this problem do causes trouble there should be a mechanism in numpy
which properly deals with the problem. I wonder whether there is one or why there isn't.
numpy.vectorize
conveniently converts a scalar function to vectorized functions that can be applied directly to arrays. However, when inputting a single value into the vectorized function, the output is a 0-dimentional array instead of the corresponding value type, which can cause errors when using the result elsewhere due to typing issues. My question is: is there a mechanism in numpy
that can resolve this problem by automatically convert the 0-dimensional array return value to the corresponding data type?
For explanation I'd give an example:
@np.vectorize ( excluded = ( 1, 2 ) )
def rescale (
value: float,
srcRange: tuple [ float, float ],
dstRange: tuple [ float, float ] = ( 0, 1 ),
) -> float:
srcMin, srcMax = srcRange
dstMin, dstMax = dstRange
t = ( value - srcMin ) / ( srcMax - srcMin )
return dstMin + t * ( dstMax - dstMin )
When calling the function above with rescale ( 5, ( 0, 10 ) )
the return value is numpy.array(0.5)
instead of just the value 0.5
.
Currently I resolve this problem by a self-defined decorator:
def vectorize0dFix ( func ):
def _func ( *args, **kwargs ):
result = func ( *args, **kwargs )
if isinstance ( result, np.ndarray ) and result.shape == ( ):
return result.item ( )
else:
return result
return _func
But if this problem do causes trouble there should be a mechanism in numpy
which properly deals with the problem. I wonder whether there is one or why there isn't.
1 Answer
Reset to default 3Short answer:
- You can unwrap 0-d results into scalars while keeping n-d results (n>0) by indexing with an empty tuple
()
. - Better yet, I would try to avoid using
@np.vectorize
altogether – in general, but in particular with your given example where vectorization is not necessary.
Long answer:
Following these answers to related questions, by indexing with an empty tuple ()
, you can systematically unwrap 0-d arrays into scalars while keeping other arrays.
So, using the @np.vectorize
d function rescale()
from your question, you can post-process your results accordingly, for example:
with_scalar_input = rescale(5, (0, 10))[()]
with_vector_input = rescale([5], (0, 10))[()]
print(type(with_scalar_input)) # <class 'numpy.float64'>
print(type(with_vector_input)) # <class 'numpy.ndarray'>
I am not aware of any built-in NumPy mechanism that solves this edge case of @np.vectorize
for you, so providing your own decorator is probably a viable way to go.
Custom scalar-unwrapping @vectorize
decorator
Writing your own custom decorator that (a) accepts all arguments of and behaves exactly like @np.vectorize
, but (b) appends the scalar unwrapping step, could look as follows:
from functools import wraps
import numpy as np
def vectorize(*wa, **wkw):
def decorator(f):
@wraps(f)
def wrap(*fa, **fkw): return np.vectorize(f, *wa, **wkw)(*fa, **fkw)[()]
return wrap
return decorator
@vectorize(excluded=(1, 2))
def rescale(value, srcRange, dstRange=(0, 1)):
srcMin, srcMax = srcRange
dstMin, dstMax = dstRange
t = (value - srcMin) / (srcMax - srcMin)
return dstMin + t * (dstMax - dstMin)
with_scalar_input = rescale(5, (0, 10))
with_vector_input = rescale([5], (0, 10))
print(type(with_scalar_input)) # <class 'numpy.float64'>
print(type(with_vector_input)) # <class 'numpy.ndarray'>
If you don't care about docstring propagation (of which @functools.wraps
takes care), the @vectorize
decorator can be shortened to:
import numpy as np
vectorize = lambda *wa, **wkw: lambda f: lambda *fa, **fkw: \
np.vectorize(f, *wa, **wkw)(*fa, **fkw)[()]
@vectorize(excluded=(1, 2))
def rescale(value, srcRange, dstRange=(0, 1)):
srcMin, srcMax = srcRange
dstMin, dstMax = dstRange
t = (value - srcMin) / (srcMax - srcMin)
return dstMin + t * (dstMax - dstMin)
with_scalar_input = rescale(5, (0, 10))
with_vector_input = rescale([5], (0, 10))
print(type(with_scalar_input)) # <class 'numpy.float64'>
print(type(with_vector_input)) # <class 'numpy.ndarray'>
Caution: All approaches using ()
, as proposed above, produce a new edge case: if the input is provided as a 0-d NumPy array, such as np.array(5)
, the result will also be unwrapped into a scalar. Likewise, you might have noticed that the scalar results are NumPy scalars, <class 'numpy.float64'>
, rather than native Python scalars, <class 'float'>
. If either of this is not acceptable for you, then more elaborate type checking or post-processing will be necessary.
Try to avoid @np.vectorize
altogether
As a final note: Maybe try to avoid using @np.vectorize
altogether in the first place, and try to write your code such that it works both with NumPy arrays and scalars.
As to avoiding @np.vectorize
: Its documentation states:
The
vectorize
function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
As to adjusting your code accordingly: Your given function rescale()
is a good example for writing code that works both with NumPy arrays and scalars correctly; in fact, it does so already, without any adjustments! You just have to ensure that vector-valued input is given as a NumPy array (rather than, say, a plain Python list or tuple):
import numpy as np
def rescale(value, srcRange, dstRange=(0, 1)):
srcMin, srcMax = srcRange
dstMin, dstMax = dstRange
t = (value - srcMin) / (srcMax - srcMin)
return dstMin + t * (dstMax - dstMin)
with_scalar_input = rescale(5, (0, 10))
with_vector_input = rescale(np.asarray([5]), (0, 10))
print(type(with_scalar_input)) # <class 'float'>
print(type(with_vector_input)) # <class 'numpy.ndarray'>
Moreover, while producing exactly the same output for vector-type input¹, the @np.vectorize
d version is orders of magnitude slower:
import numpy as np
from timeit import Timer
def rescale(value, srcRange, dstRange=(0, 1)):
srcMin, srcMax = srcRange
dstMin, dstMax = dstRange
t = (value - srcMin) / (srcMax - srcMin)
return dstMin + t * (dstMax - dstMin)
vectorized = np.vectorize(rescale, excluded=(1, 2))
a = np.random.normal(size=10000)
assert (rescale(a, (0, 10)) == vectorized(a, (0, 10))).all() # Same result?
print("Unvectorized:", Timer(lambda: rescale(a, (0, 10))).timeit(100))
print("Vectorized:", Timer(lambda: vectorized(a, (0, 10))).timeit(100))
On my machine, this produces about 0.003
seconds for the unvectorized version and about 0.8
seconds for the vectorized version.
In other words: we have more than a 250× speedup with the given, unvectorized function for a given 10,000-element array, while (if used carefully, i.e. by providing NumPy arrays rather than plain Python sequences for vector-type inputs) the function already produces scalar outputs for scalar inputs and vector outputs for vector inputs!
I guess the code above might not be the code that you are actually trying to vectorize; but anyway: in a lot of cases, a similar approach is possible.
¹) Again, the case of a 0-d vector input is special here, but you might want to check that for yourself.
本文标签: python0dimensional array problems with numpyvectorizeStack Overflow
版权声明:本文标题:python - 0-dimensional array problems with `numpy.vectorize` - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1740094981a2224175.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
vectorize
in production code (not just experimental things), you should try to find and understand its code. Currently the [source] link of its__call__
method docs is the most direct link. github/numpy/numpy/blob/v2.2.0/numpy/lib/… – hpaulj Commented 2 days ago