admin管理员组文章数量:1122832
I'm trying to replicate the repeat_by
and explode
functions from Polars using dbt with a Redshift database but am having difficulty finding an equivalent solution.
Here's the sample Polars DataFrame code for context:
NOTE: I don't need replicate
column, if that's an issue, then don't worry about it
import polars as pl
# Sample DataFrame
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"city": ["New York", "Los Angeles", "Chicago"],
"replicate": [2, 3, 1]
})
Polars DataFrame output:
shape: (3, 4)
┌─────────┬─────┬─────────────┬───────────┐
│ name ┆ age ┆ city ┆ replicate │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 │
╞═════════╪═════╪═════════════╪═══════════╡
│ Alice ┆ 25 ┆ New York ┆ 2 │
│ Bob ┆ 30 ┆ Los Angeles ┆ 3 │
│ Charlie ┆ 35 ┆ Chicago ┆ 1 │
└─────────┴─────┴─────────────┴───────────┘
Using the repeat_by
and explode
functions:
df.select(pl.all().repeat_by('replicate').explode())
Result:
shape: (6, 4)
┌─────────┬─────┬─────────────┬───────────┐
│ name ┆ age ┆ city ┆ replicate │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 │
╞═════════╪═════╪═════════════╪═══════════╡
│ Alice ┆ 25 ┆ New York ┆ 2 │
│ Alice ┆ 25 ┆ New York ┆ 2 │
│ Bob ┆ 30 ┆ Los Angeles ┆ 3 │
│ Bob ┆ 30 ┆ Los Angeles ┆ 3 │
│ Bob ┆ 30 ┆ Los Angeles ┆ 3 │
│ Charlie ┆ 35 ┆ Chicago ┆ 1 │
└─────────┴─────┴─────────────┴───────────┘
I need help finding a way to achieve the same result in dbt with Redshift.
I'm trying to replicate the repeat_by
and explode
functions from Polars using dbt with a Redshift database but am having difficulty finding an equivalent solution.
Here's the sample Polars DataFrame code for context:
NOTE: I don't need replicate
column, if that's an issue, then don't worry about it
import polars as pl
# Sample DataFrame
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"city": ["New York", "Los Angeles", "Chicago"],
"replicate": [2, 3, 1]
})
Polars DataFrame output:
shape: (3, 4)
┌─────────┬─────┬─────────────┬───────────┐
│ name ┆ age ┆ city ┆ replicate │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 │
╞═════════╪═════╪═════════════╪═══════════╡
│ Alice ┆ 25 ┆ New York ┆ 2 │
│ Bob ┆ 30 ┆ Los Angeles ┆ 3 │
│ Charlie ┆ 35 ┆ Chicago ┆ 1 │
└─────────┴─────┴─────────────┴───────────┘
Using the repeat_by
and explode
functions:
df.select(pl.all().repeat_by('replicate').explode())
Result:
shape: (6, 4)
┌─────────┬─────┬─────────────┬───────────┐
│ name ┆ age ┆ city ┆ replicate │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ i64 │
╞═════════╪═════╪═════════════╪═══════════╡
│ Alice ┆ 25 ┆ New York ┆ 2 │
│ Alice ┆ 25 ┆ New York ┆ 2 │
│ Bob ┆ 30 ┆ Los Angeles ┆ 3 │
│ Bob ┆ 30 ┆ Los Angeles ┆ 3 │
│ Bob ┆ 30 ┆ Los Angeles ┆ 3 │
│ Charlie ┆ 35 ┆ Chicago ┆ 1 │
└─────────┴─────┴─────────────┴───────────┘
I need help finding a way to achieve the same result in dbt with Redshift.
Share Improve this question edited Nov 24, 2024 at 18:23 pppery 3,78425 gold badges37 silver badges50 bronze badges asked Nov 21, 2024 at 19:43 Akmal SolievAkmal Soliev 7428 silver badges24 bronze badges1 Answer
Reset to default 2I used JOIN and a CROSS JOIN since you mentioned generate_series is not supported.
I tested this in Snowflake , since I dont have redshift. I guess the syntax might not be the exact same, but still should work.
let me know if it works for you.
SET tbl_count = (SELECT max(replicate) FROM my_table);
WITH numbers AS (
SELECT
row_number() OVER ( order by 1) AS num
FROM
(SELECT 1 FROM my_table LIMIT $tbl_count ) )
,
repeated_data AS (
SELECT
t.name,
t.age,
t.city,
t.replicate
FROM
my_table t
CROSS JOIN numbers
WHERE numbers.num <= t.replicate
)
SELECT
name,
age,
city,
replicate
FROM
repeated_data
ORDER BY name, age;
Trying with Recursion since using Limit variable is tricky in Redshift
WITH RECURSIVE numbers AS (
-- Base case: start from 1
SELECT 1 AS num
UNION ALL
-- Recursive case: increment the number by 1 each time
SELECT num + 1
FROM numbers
WHERE num < (SELECT max(replicate) FROM my_table) -- Stop when num reaches max(replicate)
),
repeated_data AS (
SELECT
t.name,
t.age,
t.city,
t.replicate
FROM
my_table t
CROSS JOIN numbers
WHERE numbers.num <= t.replicate
)
SELECT
name,
age,
city,
replicate
FROM
repeated_data
ORDER BY
name, age;
Output :
本文标签: sqlExplode values in dbtStack Overflow
版权声明:本文标题:sql - Explode values in dbt - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736307610a1933373.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论