admin管理员组文章数量:1315834
I have a function that creates a ChunkArray<ListType>
from two chunk arrays, however I'm having a hard time converting the column function into a "Function Expression".
The goal is something similar to pearson or spearman correlation implementation in Rust Polars where it takes in two parameters instead of self, see this.
The output should be a new column of type List.
Here is my code where I'm trying to create synthetic_data_expr
, however I hit a dead end on evaluating the expressions.
use polars::prelude::*;
use rand::thread_rng;
use rand_distr::{Normal, Distribution};
fn synthetic_data(
mean_series:&ChunkedArray<Float64Type>,
variance_series:&ChunkedArray<Float64Type>,
) -> PolarsResult<ChunkedArray<ListType>> {
let mut rng = thread_rng();
let random_values: Vec<Vec<f64>> = mean_series.iter()
.zip(variance_series.iter())
.map(|(mean, variance)| {
let std_dev = variance.unwrap().sqrt();
let normal_dist = Normal::new(mean.unwrap(), std_dev).unwrap();
(0..39).map(|_| normal_dist.sample(&mut rng)).collect()
})
.collect();
let mut list_chunk = ListPrimitiveChunkedBuilder::<Float64Type>::new(
"intraday".into(),
5, //rows of data
39,
DataType::Float64
);
for row in random_values {
list_chunk.append_slice(&row);
}
Ok(list_chunk.finish())
}
fn synthetic_data_column(s:&[Column]) -> PolarsResult<Column> {
let _mean = &s[0];
let _varaince = &s[1];
let calc = synthetic_data(_mean.f64().unwrap(), _varaince.f64().unwrap());
Ok(calc?.into_column())
}
fn synthetic_data_expr(mean_column: Expr, variance_column: Expr) -> Expr {
mean_column.apply_many(
synthetic_intraday_data_column(),
&[variance_column],
GetOutput::same_type(),
)
}
Here is an example that I'm trying to accomplish for synthetic_data_expr
/// Compute the pearson correlation between two columns.
pub fn pearson_corr(a: Expr, b: Expr) -> Expr {
let input = vec![a, b];
let function = FunctionExpr::Correlation {
method: CorrelationMethod::Pearson,
};
Expr::Function {
input,
function,
options: FunctionOptions {
collect_groups: ApplyOptions::GroupWise,
cast_options: Some(CastingRules::cast_to_supertypes()),
flags: FunctionFlags::default() | FunctionFlags::RETURNS_SCALAR,
..Default::default()
},
}
}
I have a function that creates a ChunkArray<ListType>
from two chunk arrays, however I'm having a hard time converting the column function into a "Function Expression".
The goal is something similar to pearson or spearman correlation implementation in Rust Polars where it takes in two parameters instead of self, see this.
The output should be a new column of type List.
Here is my code where I'm trying to create synthetic_data_expr
, however I hit a dead end on evaluating the expressions.
use polars::prelude::*;
use rand::thread_rng;
use rand_distr::{Normal, Distribution};
fn synthetic_data(
mean_series:&ChunkedArray<Float64Type>,
variance_series:&ChunkedArray<Float64Type>,
) -> PolarsResult<ChunkedArray<ListType>> {
let mut rng = thread_rng();
let random_values: Vec<Vec<f64>> = mean_series.iter()
.zip(variance_series.iter())
.map(|(mean, variance)| {
let std_dev = variance.unwrap().sqrt();
let normal_dist = Normal::new(mean.unwrap(), std_dev).unwrap();
(0..39).map(|_| normal_dist.sample(&mut rng)).collect()
})
.collect();
let mut list_chunk = ListPrimitiveChunkedBuilder::<Float64Type>::new(
"intraday".into(),
5, //rows of data
39,
DataType::Float64
);
for row in random_values {
list_chunk.append_slice(&row);
}
Ok(list_chunk.finish())
}
fn synthetic_data_column(s:&[Column]) -> PolarsResult<Column> {
let _mean = &s[0];
let _varaince = &s[1];
let calc = synthetic_data(_mean.f64().unwrap(), _varaince.f64().unwrap());
Ok(calc?.into_column())
}
fn synthetic_data_expr(mean_column: Expr, variance_column: Expr) -> Expr {
mean_column.apply_many(
synthetic_intraday_data_column(),
&[variance_column],
GetOutput::same_type(),
)
}
Here is an example that I'm trying to accomplish for synthetic_data_expr
/// Compute the pearson correlation between two columns.
pub fn pearson_corr(a: Expr, b: Expr) -> Expr {
let input = vec![a, b];
let function = FunctionExpr::Correlation {
method: CorrelationMethod::Pearson,
};
Expr::Function {
input,
function,
options: FunctionOptions {
collect_groups: ApplyOptions::GroupWise,
cast_options: Some(CastingRules::cast_to_supertypes()),
flags: FunctionFlags::default() | FunctionFlags::RETURNS_SCALAR,
..Default::default()
},
}
}
Share
Improve this question
edited Jan 31 at 2:13
Dean MacGregor
18.8k10 gold badges51 silver badges99 bronze badges
asked Jan 30 at 2:45
Trevor SeibertTrevor Seibert
11910 bronze badges
1 Answer
Reset to default 1You can't mimic the way you see most of the exprs in source to create your own because they all go into enums which are hard coded. Instead you'd use map
as seen here
My rust-analyzer doesn't like your synthetic data function but, notwithstanding, here's how you'd use it
fn use_syn(df: DataFrame) {
let res = df
.clone()
.lazy()
.select([as_struct(vec![col("mean_col"), col("var_col")]).map(
|s| {
let ca = s.struct_().unwrap();
let seriess = ca.fields_as_series();
let mean_series = &seriess[0];
let mean_ca = mean_series.f64().unwrap();
let variance_series = &seriess[1];
let variance_ca = variance_series.f64().unwrap();
let out = synthetic_data(mean_ca, variance_ca).unwrap();
Ok(Some(out.into_series().into()))
},
GetOutput::from_type(DataType::List(Box::new(DataType::Float64))),
)])
.collect()
.unwrap();
}
One thing you can do to make that more convenient is make your own traits and then impl it for Expr. With that, your custom function can be wrapped in an Expr method which you can use the same as the native methods. That'd look like this:
trait CustomExprs {
fn syn_data(self, other: Expr) -> Expr;
}
impl CustomExprs for Expr {
fn syn_data(self, other: Expr) -> Expr {
as_struct(vec![self, other]).map(
|s| {
let ca = s.struct_().unwrap();
let seriess = ca.fields_as_series();
let mean_series = &seriess[0];
let mean_ca = mean_series.f64().unwrap();
let variance_series = &seriess[1];
let variance_ca = variance_series.f64().unwrap();
let out = synthetic_data(mean_ca, variance_ca).unwrap();
Ok(Some(out.into_series().into()))
},
GetOutput::from_type(DataType::List(Box::new(DataType::Float64))),
)
}
}
and then you can do
fn use_syn2(df: DataFrame) {
let res = df
.clone()
.lazy()
.select([col("mean").syn_data(col("var_col")).alias("syn_data")]);
}
本文标签: rustHow to create a custom function expressionStack Overflow
版权声明:本文标题:rust - How to create a custom function expression - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741988095a2408804.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论