How to write a generic Python function that works with Python, Numpy or Pandas arguments and returns the same type - Stack Overf-软件玩家

admin管理员组
文章数量:1122846

What's the best way to write a python function that can be used with either float, Numpy, or Pandas data types and always returns the same data type as the arguments it was given. The catch is, the calculation includes one or more float values.

E.g. toy example:

def mycalc(x, a=1.0, b=1.0):
    return a * x + b

(I've simplified the problem a lot here as I would ideally want to have more than one input argument like x, but you can assume that the function is vectorized in the sense that it works with Numpy array arguments and Pandas series).

For Numpy arrays and Pandas Series this works fine because the dtype is dictated by the input arguments.

import numpy as np
x = np.array([1, 2, 3], dtype="float32")
print(mycalc(x).dtype)  # float32

import pandas as pd
x = pd.Series([1.0, 2.0, 3.0], dtype="float32")
print(mycalc(x).dtype)  # float32

But when using numpy floats of lower precision, the dtype is 'lifted' to float64, presumably due to the float arguments in the formula:

x = np.float32(1.0)
print(mycalc(x).dtype)  # float64

Ideally, I would like the function to work with Python floats, numpy scalars, numpy arrays, Pandas series, Jax arrays, and even Sympy symbolic variables if possible.

But I don't want to clutter up the function with too many additional statements to handle each case.

I tried this, which works with Numpy scalars but breaks when you provide arrays or series:

def mycalc(x, a=1.0, b=1.0):
    a = type(x)(a)
    b = type(x)(b)
    return a * x + b

assert isinstance(mycalc(1.0), float)
assert isinstance(mycalc(np.float32(1.0)), np.float32)
mycalc(np.array([1, 2, 3], dtype="float32"))  # raises TypeError: expected a sequence of integers or a single integer, got '1.0'

Also, there is an answer here to a similar question which uses a decorator function to make copies of the input argument, which is a nice idea, but this was only for extending the function from Numpy arrays to Pandas series and doesn't work with Python floats or Numpy scalars.

import functools

def apply_to_pandas(func):
    @functools.wraps(func)
    def wrapper_func(x, *args, **kwargs):
        if isinstance(x, (np.ndarray, list)):
            out = func(x, *args, **kwargs)
        else:
            out = x.copy(deep=False)
            out[:] = np.apply_along_axis(func, 0, x, *args, **kwargs)
        return out
    return wrapper_func

@apply_to_pandas
def mycalc(x, a=1.0, b=1.0):
    return a * x + b

mycalc(1.0) # TypeError: copy() got an unexpected keyword argument 'deep'

Update

As pointed out by @Dunes in the comments below, this is no longer a problem in Numpy versions 2.x as explained here in the Numpy 2.0 Migration Guide.

In the new version, (np.float32(1.0) + 1).dtype == "float32". Therefore the original function above returns a result of the same dtype as the input x.

E.g. toy example:

def mycalc(x, a=1.0, b=1.0):
    return a * x + b

For Numpy arrays and Pandas Series this works fine because the dtype is dictated by the input arguments.

import numpy as np
x = np.array([1, 2, 3], dtype="float32")
print(mycalc(x).dtype)  # float32

import pandas as pd
x = pd.Series([1.0, 2.0, 3.0], dtype="float32")
print(mycalc(x).dtype)  # float32

But when using numpy floats of lower precision, the dtype is 'lifted' to float64, presumably due to the float arguments in the formula:

x = np.float32(1.0)
print(mycalc(x).dtype)  # float64

Ideally, I would like the function to work with Python floats, numpy scalars, numpy arrays, Pandas series, Jax arrays, and even Sympy symbolic variables if possible.

But I don't want to clutter up the function with too many additional statements to handle each case.

I tried this, which works with Numpy scalars but breaks when you provide arrays or series:

def mycalc(x, a=1.0, b=1.0):
    a = type(x)(a)
    b = type(x)(b)
    return a * x + b

assert isinstance(mycalc(1.0), float)
assert isinstance(mycalc(np.float32(1.0)), np.float32)
mycalc(np.array([1, 2, 3], dtype="float32"))  # raises TypeError: expected a sequence of integers or a single integer, got '1.0'

import functools

def apply_to_pandas(func):
    @functools.wraps(func)
    def wrapper_func(x, *args, **kwargs):
        if isinstance(x, (np.ndarray, list)):
            out = func(x, *args, **kwargs)
        else:
            out = x.copy(deep=False)
            out[:] = np.apply_along_axis(func, 0, x, *args, **kwargs)
        return out
    return wrapper_func

@apply_to_pandas
def mycalc(x, a=1.0, b=1.0):
    return a * x + b

mycalc(1.0) # TypeError: copy() got an unexpected keyword argument 'deep'

Update

As pointed out by @Dunes in the comments below, this is no longer a problem in Numpy versions 2.x as explained here in the Numpy 2.0 Migration Guide.

In the new version, (np.float32(1.0) + 1).dtype == "float32". Therefore the original function above returns a result of the same dtype as the input x.

Share Improve this question edited Nov 23, 2024 at 1:10 asked Nov 21, 2024 at 16:06 Bill 11.6k12 gold badges67 silver badges97 bronze badges

What version of numpy are you using? I have been unable to reproduce this with v2.1.0 on python 3.10. The dtype of scalars is preserved. – Dunes Commented Nov 21, 2024 at 17:08
@Dunes I am using numpy version 1.26.1 with Python 3.10.12. Can you confirm which part you can't reproduce? This result I am guessing: mycalc(np.float32(1.0)).dtype == np.float64 – Bill Commented Nov 21, 2024 at 18:05
1 It might help if you clearly distinguished between type and dtype. Also look at the [source] for some numpy fuctions, especially ones that delegate to methods. They often check inputs and convert rhem as needed to valid arrays (preserving array subclassing as needed). There's a lot of conversion and method delegation going on behind the scene when using operators. – hpaulj Commented Nov 21, 2024 at 18:44
Thanks @hpaulj, I changed the question text to make it more clear that it is the data type (dtype) that I want to match not the object type (except in the case of Python floats). When you say I should read the docs for Numpy functions are you suggesting there's a away to change the default behaviour of operations so that it doesn't 'lift' the precision? – Bill Commented Nov 21, 2024 at 18:52
1 I was able to reproduce with numpy 1.26.1. But not with version 2.0.2. So it would appear that the short answer is to all this is to upgrade to at least version 2 of numpy. And if you cannot, then explain why in the question. ie. (np.float32(1) + 1).dtype == np.dtype('float32') is true in version 2.x, but false in version 1.x – Dunes Commented Nov 22, 2024 at 19:22

| Show 2 more comments

3 Answers 3

Sorted by: Reset to default 1

I don't mean for this is be an authoritative answer but rather maybe something to think about and see if it helps get you farther along. What if you tried to rely on the more advanced types implementation of the "r" dunder methods that seem to be more nuanced and did something like this:

import numpy as np
import pandas as pd

def mycalc(x, a=1, b=1):
    foo = x * a + b
    return foo if type(foo) == type(x) else type(x)(foo)

print(type(mycalc(1)))
print(type(mycalc(1.0)))
print(type(mycalc(np.float32(1.0))))
print(type(mycalc(np.array([1, 2, 3], dtype="float32"))))
print(type(mycalc(pd.Series([1, 2, 3], dtype="float64"))))

That seems to give back:

<class 'int'>
<class 'float'>
<class 'numpy.float32'>
<class 'numpy.ndarray'>
<class 'pandas.core.series.Series'>

post a comment here so I will know you saw this and then I will remove this as again it is not really an authoritative answer in my opinion, just an idea.

This doesn't exactly solve the problem I posed since the desired data type must be specified here, but I think its a simple, robust solution to the problem, rather than trying to automatically do the conversions based on the input types.

def mycalc(x, a=1.0, b=1.0, float_type=float):
    a = float_type(a)
    b = float_type(b)
    return a * x + b

assert isinstance(mycalc(1.0), float)
assert type(mycalc(np.float32(1.0), float_type=np.float32)) == np.float32
x = np.array([1.0, 2.0, 3.0], dtype="float32")
assert mycalc(x, float_type=np.float32).dtype == np.float32
x = pd.Series([1.0, 2.0, 3.0], dtype="float32")
assert mycalc(x, float_type=np.float32).dtype == np.float32
x = pd.Series([1.0, 2.0, 3.0], dtype="float64")
assert mycalc(x, float_type=np.float64).dtype == np.float64

It would be nice if the conversions could be done by a decorator but since the floats are default values of keyword arguments, there's no way a decorator can change them.

I'm still looking for better solutions but I post this here because it might be a good workaround in some cases.

This works for numpy and pandas types. I'm not sure this is the best way though.

import numpy as np
import pandas as pd

def get_item_type(var):
    try:
        var = var.to_numpy()
    except AttributeError:
        pass
    try:
        t = type(var.flat[0])
    except AttributeError:
        t = type(var)
    return t


def mycalc(x, a=1.0, b=1.0):
    float_type = get_item_type(x)
    a = float_type(a)
    b = float_type(b)
    return a * x + b


assert isinstance(mycalc(1.0), float)
assert type(mycalc(np.float32(1.0))) == np.float32
x = np.array([1.0, 2.0, 3.0], dtype="float32")
assert mycalc(x).dtype == np.float32
x = pd.Series([1.0, 2.0, 3.0], dtype="float32")
assert mycalc(x).dtype == np.float32
x = pd.Series([1.0, 2.0, 3.0], dtype="float64")
assert mycalc(x).dtype == np.float64

本文标签：

版权声明：本文标题：How to write a generic Python function that works with Python, Numpy or Pandas arguments and returns the same type - Stack Overf 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736309105a1933899.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

How to write a generic Python function that works with Python, Numpy or Pandas arguments and returns the same type - Stack Overf

3 Answers 3

更多相关文章

PC系统安装&amp;引导：5、安装windows系统

win11 家庭版升级成专业版

Windows 11最稳定版本详解

python - dask `var` and `std` with ddof in groupby context and other aggregations - Stack Overflow

javascript - Stripe Payment Vue3 - Stack Overflow

c# - OutOfMemoryException in .NET 8 Applications on IIS with EF core - Stack Overflow

android - How to force Jetpack compose LazyHorizontalGrid to fill row by row - Stack Overflow

python - How to Call a FastAPI Endpoint on Google App Engine Protected by IAP as an End User? - Stack Overflow

javascript - Odoo CORS Access Issue - Stack Overflow

Diagnostic analyzer runner is currently unavailable doe to an internal error (with CodeRush) - Stack Overflow

promql - Prometheus - how to group by lable 2 metrics and filter one with another? - Stack Overflow

c# - Printing Popup Hangs over 5 seconds for each page - Stack Overflow

How to run steps in parallel in Buildbot - Stack Overflow

kubernetes - istio canary strategy with dynamic routing rules with different apps - Stack Overflow

asp.net core - aspnetboilerplate InvalidOperationException - Stack Overflow

New Python Instance in VS Code and the terminal is passing indentions that do not exist in the code editor window - Stack Overfl

multithreading - C++ thread exiting without a notice -- need help debugging with gdb - Stack Overflow

Azure Storage Account IP Address Exception Stopped Working over VPN - Stack Overflow

react hooks - My browser localstorage clears everytime i refresh - Stack Overflow

hcl - How to create parallel builds foreach item in list using packer template - Stack Overflow

发表评论

推荐文章

Modify php code from plugin

plugins - Get value from an input field and pass into update_meta_data as $meta_value

qml - Unable to set the panel icon of an applet in plasma - Stack Overflow

plugin development - Wordpress Media Uploader not displaying image that has just been uploaded

woocommerce offtopic - Is it possible to extract all product columns except for one?

热门文章

javascript - Prettier in Vue - disable a new line after html tag - Stack Overflow

How do I calculate a Vector on the edge of a circle in Unity for a CharacterController? - Stack Overflow

plugins - worldpay class not working with namespace in WordPress

categories - Assign parent category to all posts that are already assigned to child category

Restrict APP REST API for users with account and capabilities

Getting 401 (access denied) when trying to use the REST API

otlp - how to configure otel.javaagent to print logs in JSON format - Stack Overflow

How do I create a finished application using only Racket, Figma + something like Android Studio - Stack Overflow

dart - Flutter hot restart speed on Windows - Stack Overflow

deprecation - WP dynamic block - change content without saving

最新文章

Java入门级教学（IDEA的下载与安装与JDK的环境配置）

华硕笔记本电脑用U盘重装windows系统

物理网卡MAC修改器v3.0 - 真实网卡硬件MAC地址修改，重装系统不变！

如何一键安装win7系统(一键安装win7系统步骤)

Windows 11最稳定版本详解

winapi - Win32 DrawText() ignores text color set on the device context and draws text in background color - Stack Overflow

How to get Graalvm to convert AWT Java program to exe - Stack Overflow

Embedding of sequence of events sets - Stack Overflow

hcl - How to create parallel builds foreach item in list using packer template - Stack Overflow

react hooks - My browser localstorage clears everytime i refresh - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

PC系统安装&引导：5、安装windows系统