admin管理员组

文章数量:1289834

I'm having some trouble formulating the correct recursive behaviour to get what I want in SQL. I'm limited to the BigQuery environment and I want to avoid using any JavaScript if I can, so I wanted to try and recreate this behaviour just using CTEs if possible. I can do exactly what I want in Python, but have been unable to do it in SQL. Yes I know SQL is not really designed for this, but I do have a reason for it so would like to try and get it to work. I want to be able to provide any length array and return a list of permutations. For example:

INPUT [1, 1, 1] OUTPUT [0, 0, 0]
INPUT [2, 1, 1] OUTPUT [[0, 0, 0], [1, 0, 0]]
INPUT [2, 1, 2] OUTPUT [[0, 0, 0], [1, 0, 0], [0, 0, 1], [1, 0, 1]]
INPUT [1, 5] OUTPUT [[0,0], [0,1], [0,2], [0,3], [0,4]]

I don't mind how the inputs and outputs are formatted as long as I can parse them one way or another. E.g. returning rows of strings are fine, each element in a separate row is fine as I can use array_agg, etc.

In Python doing this recursively or using for-loops is pretty trivial. I start with a list of zeroes and work my way along each index, incrementing the value at that index until it reaches input value-1 and store each permutation before moving on.

In SQL this is what I have so far, which is only really the first step:

WITH RECURSIVE
  OrigSeq AS (SELECT [2, 1, 2] as some_numbers), -- input sequence
  BaseSeq AS (SELECT ARRAY(SELECT 0 as a FROM UNNEST(OrigSeq.some_numbers)) AS base FROM OrigSeq), -- get array of 0s of same size as input sequence
  Sequences AS (
    (SELECT 0 AS perm_id, 0 as idx, BaseSeq.base[0] as iter, OrigSeq.some_numbers[0]-1 AS orig, BaseSeq.base as base, OrigSeq.some_numbers as num FROM OrigSeq, BaseSeq)
    UNION ALL
    -- increment index
    SELECT perm_id, idx+1 as idx, base[idx+1] as iter, num[idx+1]-1 as orig, base, num FROM Sequences WHERE idx < ARRAY_LENGTH(num)-1
  ),
  Seq2 AS (
    SELECT * FROM Sequences
    UNION ALL
    -- parse iters - not doing this quite right and my brain stops functioning around here
    SELECT perm_id+1, idx, base[idx]+1 as iter, orig, base, num FROM Sequences where iter <= orig
  ),
  Seq3 AS (
    Select * From Seq2
    UNION ALL
    SELECT perm_id, idx, iter, orig, base, num from Sequences
  )
SELECT perm_id, idx, orig, iter
FROM Seq3
ORDER BY perm_id, idx

Any guidance appreciated.

I'm having some trouble formulating the correct recursive behaviour to get what I want in SQL. I'm limited to the BigQuery environment and I want to avoid using any JavaScript if I can, so I wanted to try and recreate this behaviour just using CTEs if possible. I can do exactly what I want in Python, but have been unable to do it in SQL. Yes I know SQL is not really designed for this, but I do have a reason for it so would like to try and get it to work. I want to be able to provide any length array and return a list of permutations. For example:

INPUT [1, 1, 1] OUTPUT [0, 0, 0]
INPUT [2, 1, 1] OUTPUT [[0, 0, 0], [1, 0, 0]]
INPUT [2, 1, 2] OUTPUT [[0, 0, 0], [1, 0, 0], [0, 0, 1], [1, 0, 1]]
INPUT [1, 5] OUTPUT [[0,0], [0,1], [0,2], [0,3], [0,4]]

I don't mind how the inputs and outputs are formatted as long as I can parse them one way or another. E.g. returning rows of strings are fine, each element in a separate row is fine as I can use array_agg, etc.

In Python doing this recursively or using for-loops is pretty trivial. I start with a list of zeroes and work my way along each index, incrementing the value at that index until it reaches input value-1 and store each permutation before moving on.

In SQL this is what I have so far, which is only really the first step:

WITH RECURSIVE
  OrigSeq AS (SELECT [2, 1, 2] as some_numbers), -- input sequence
  BaseSeq AS (SELECT ARRAY(SELECT 0 as a FROM UNNEST(OrigSeq.some_numbers)) AS base FROM OrigSeq), -- get array of 0s of same size as input sequence
  Sequences AS (
    (SELECT 0 AS perm_id, 0 as idx, BaseSeq.base[0] as iter, OrigSeq.some_numbers[0]-1 AS orig, BaseSeq.base as base, OrigSeq.some_numbers as num FROM OrigSeq, BaseSeq)
    UNION ALL
    -- increment index
    SELECT perm_id, idx+1 as idx, base[idx+1] as iter, num[idx+1]-1 as orig, base, num FROM Sequences WHERE idx < ARRAY_LENGTH(num)-1
  ),
  Seq2 AS (
    SELECT * FROM Sequences
    UNION ALL
    -- parse iters - not doing this quite right and my brain stops functioning around here
    SELECT perm_id+1, idx, base[idx]+1 as iter, orig, base, num FROM Sequences where iter <= orig
  ),
  Seq3 AS (
    Select * From Seq2
    UNION ALL
    SELECT perm_id, idx, iter, orig, base, num from Sequences
  )
SELECT perm_id, idx, orig, iter
FROM Seq3
ORDER BY perm_id, idx

Any guidance appreciated.

Share Improve this question edited Feb 20 at 2:41 Dale K 27.5k15 gold badges58 silver badges83 bronze badges asked Feb 20 at 2:12 dgBPdgBP 1,6936 gold badges27 silver badges42 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 2

Recursive solution example for BigQuery SQL
Source:

WITH RECURSIVE
 OrigSeq as(
   select [1, 1, 1] as some_numbers --  [0, 0, 0]
  union all
  select [2, 1, 1] --  [[0, 0, 0], [1, 0, 0]]
  union all
  select [2, 1, 2] --  [[0, 0, 0], [1, 0, 0], [0, 0, 1], [1, 0, 1]]
  union all
  select [1, 5] as some_numbers --  [[0,0], [0,1], [0,2], [0,3], [0,4]]
)

Query:

  1. It is necessary to distinguish the rows somehow. Just give some kind of Id.
  2. Expand every array element from 0 to some_numbers[N]-1
  3. Recursively: Cartesian product sets {0,...,N1}X{0,...,N2}X{0,...,N3}...
  4. Concat back
,OrigSeqOrdered as( 
  select * ,row_number()over(order by (select null)) id
  from OrigSeq
)
,r as(
  select id, 0 lvl, nn,some_numbers,ARRAY_LENGTH(some_numbers) l
    ,[cast(nn as string)] arr
  from OrigSeqOrdered
  cross join UNNEST(GENERATE_ARRAY(0, some_numbers[0]-1)) as nn 
  union all
  select id,lvl+1, nn,some_numbers,l
    ,array_concat(arr,[cast(nn as string)]) arr
  from r
  cross join UNNEST(GENERATE_ARRAY(0, some_numbers[lvl+1]-1)) as nn
  where (lvl+1)<l
)
select id ,concat("[ [",array_to_string(array_agg(array_to_string(arr,",")),"], [:"),"] ]") sa
from r
where lvl=(l-1)
group by id
order by id 
id sa
1 "[ [0,0,0] ]"
2 "[ [1,0,0], [:0,0,0] ]"
3 "[ [1,0,1], [:1,0,0], [:0,0,1], [:0,0,0] ]"
4 "[ [0,1], [:0,2], [:0,3], [:0,4], [:0,0] ]"

Or as array<Int64>

r as(
  select id,0 lvl, nn,some_numbers
    ,ARRAY_LENGTH(some_numbers) l  -- array length, used for recursion stop
    ,[nn] arr
  from OrigSeqOrdered 
  cross join UNNEST(GENERATE_ARRAY(0, some_numbers[0]-1)) as nn 
  union all
  select id
     ,lvl+1, nn,some_numbers,l
    ,array_concat(arr,[nn]) arr
  from r
  cross join UNNEST(GENERATE_ARRAY(0, some_numbers[lvl+1]-1)) as nn 
  where (lvl+1)<l
)
select id ,some_numbers, arr,lvl,nn
from r
where lvl=(l-1)  -- only completed array's
order by id,lvl

Details for cartesian product of sets: Set A{0,1}, set B{0,1}, Set C{0,1,2}.( For OP array[2,2,3])

Create array for some_numbers[0]

GENERATE_ARRAY(0, some_numbers[0]-1)
3->(3-1)
Output: [0,1,2]

Expand array to set of rows

UNNEST([0,1,2])
Output:{0}
       {1}
       {2}

Recursive Cartesian product

Anchor part of recursive query
|  First recursion
|  |            Next recursion  
↓  ↓            ↓ 
A  x B     =    x C   =
          a,b       a,b,c
{0} {0}  {0,0} ---→{0,0,0}
{1} {1}  {1,0}  |-→{0,0,1}
         {0,1}  └-→{0,0,2}
         {1,1}     {1,0,0}
                   {1,0,1}
                   {1,0,2}
                   {0,1,0}
                   {0,1,1}
                   {0,1,2}
                   {1,1,0}
                   {1,1,1}
                   {1,1,2}

Concatenating arrays

first array_concat([0],0)->[0,0]  
then  array_concat([0,0],1)->[0,0,1]

Fiddle

Recursivity to append

A first recursive loop can compute all possible values at each position,
then a second recursive one will append to an array starting from the empty array.

WITH RECURSIVE
  OrigSeq AS (SELECT ARRAY[2, 1, 2] AS a),
  -- List the indices of our array (which will give us the number of cells too):
  Pos AS (SELECT n, pos FROM OrigSeq, UNNEST(a) AS n WITH OFFSET AS pos),
  -- For each position list its possible values:
  Poss AS
  (
    SELECT n - 1 AS n, Pos.pos FROM Pos -- Per specification, "2" should give a range from 0 to 1 (2 is the count, not the max), so start one step beyond.
    UNION ALL
    SELECT n - 1, pos FROM Poss WHERE n > 0
  ),
  -- Start from an empty array, generate as many arrays of 1 element as there are possibilities for cell 1, and so on:
  Combi AS
  (
    SELECT [] a, 0 next
    UNION ALL
    SELECT ARRAY_CONCAT(a, [ n ]), next + 1
    FROM Combi, Poss
    WHERE Poss.pos = next
  )
-- Keep only the completed arrays:
SELECT * from Combi WHERE next = (SELECT COUNT(1) FROM Pos);
a lvl
{1,0,1} 3
{0,0,1} 3
{1,0,0} 3
{0,0,0} 3

See the PostgreSQL equivalent in a fiddle:

Recursivity to decrement

There is a way to first generate all positions as posToDecrement,
then recursively starting from the initial numbers array as a, joining unnest(a) to posToDecrement to generate the different combinations, and array_agg(a.value) over (order by pos rows between unbounded preceding and unbounded following), discarding rows where any value falls under 0 (min() < 0).

You'll find a fiddle proof-of-concept for PostgreSQL.

However my intuition tells there is room for improvement:

  • given a start array of 3 elements, on each iteration I'd like to generate 3 triplets per previous triplet (thus 1 triplet on first iteration, 3 on the second one, 9 on the third, etc.) where each new triplet is the previous, minus 1 on 1 of its fields (thus if iteration 0 was [3,2,4], iteration 1 should produce [2,2,4],[3,1,4],[3,2,3]); relying on a UNION non-ALL to discard duplicates (at iteration 2 two paths lead to [2,1,4]: [2,2-1,4] and [3-1,1,4]).
  • as UNNEST generates 3 rows per triplet, and a windowed ARRAY_AGG can generate a new array of 3 elements for each of those rows, there should be no need to join anything else than the UNNEST peudo-table.
  • however I'm stuck trying to ARRAY_AGG(prev.n - CASE WHEN v.pos = CURRENT_ROW THEN 1 ELSE 0 END) FROM UNNEST(…) v
    (which already made questions on SO)
  • thus for now I have only the suboptimal DISTINCT after having joined Pos (listing all possible indices)
  • another way could be to ARRAY_CONCAT(ARRAY_AGG(v.n) OVER (… TO 1 PRECEDING), v.n - 1, ARRAY_AGG(v.n) OVER (… FROM 1 FOLLOWING))

本文标签: google bigquerySQL recursive CTEs to perm arrayStack Overflow