admin管理员组

文章数量:1332881

I have a table with parent-child relationship. For a dashboard I am creating, I need to create a table so the dashboard (SAS Viya) can work with the table. For that I need to get for every company id (cid) all parents and children, including the level within the structure (e.g. if it is the 3-rd level child) and a sort_chain which is a chain of numbers from the parent to that child, so be able to group/sort all sort_cid's in the dashboard.

(The name sort_ is because I will later use these columns in a dashboard to display/sort the data properly.)

In short, I'm trying to go from this table:

cid parent_cid
a
b a
c b
d a
e
f
g f

I have a table with parent-child relationship. For a dashboard I am creating, I need to create a table so the dashboard (SAS Viya) can work with the table. For that I need to get for every company id (cid) all parents and children, including the level within the structure (e.g. if it is the 3-rd level child) and a sort_chain which is a chain of numbers from the parent to that child, so be able to group/sort all sort_cid's in the dashboard.

(The name sort_ is because I will later use these columns in a dashboard to display/sort the data properly.)

In short, I'm trying to go from this table:

cid parent_cid
a
b a
c b
d a
e
f
g f

To this table using PROC SQL in SAS:

EDIT: I have updated sort_chain below to reflect what I believe most people feel as more 'logical'.

cid sort_cid sort_chain sort_level
a a 1.0.0 1
a b 1.1.0 2
a c 1.1.1 3
a d 1.2.0 2
b a 1.0.0 1
b b 1.1.0 2
b c 1.1.1 3
c a 1.0.0 1
c b 1.1.0 2
c c 1.1.1 3
d a 1.0.0 1
d d 1.2.0 2
e e 2.0.0 1
f f 3.0.0 1
f g 3.1.0 2
g f 3.0.0 1
g g 3.1.0 2

How can I achieve this?

What I have tried:

First step was setting all missing parent_cid to itself (lets call this table1):

cid parent_cid
a a
b a
c b
d a
e e
f f
g f

Then I tried self-joining this table to get all known daughters, but then you get an extra column, instead of extra rows:

SELECT
    a.cid, a.parent_cid, b.cid AS sort_chain
FROM
    table1 a
INNER JOIN
    table1 b
    ON
    b.parent_company_id = apany_id;

Same for self-join to get all known parents.

I also tried a cross-join, but then the table gets everything x everything, which doesn't make sense.

In short: I feel like I'm thinking in the right direction, but missing something or messing something up. Hope someone can provide me an example code or point me in the right direction. Thanks!

Share Improve this question edited Nov 21, 2024 at 13:31 Maanloper asked Nov 20, 2024 at 15:39 MaanloperMaanloper 11 silver badge2 bronze badges 7
  • there is a CONNECT BY syntax that you may research – Randy Commented Nov 20, 2024 at 17:12
  • It seems you want to generate a three part string for SORT_CHAIN because the maximum depth of any subtree is just 3. What do you want when you have a tree that is deeper? What if you add a node H that is a child of D ? – Tom Commented Nov 20, 2024 at 20:05
  • CONNECT BY is not available in SAS Proc SQL. Certainly is in ORACLE. – Richard Commented Nov 20, 2024 at 22:05
  • Suppose you have a chain a.b.c.d do you really want on output row for each order dependent pair along the chain? (i.e. aa ab ac ad bb bc bd cc cd dd (and backward) ba ca da cb db dc ? – Richard Commented Nov 20, 2024 at 22:12
  • @Tom, the depth must be in principal unlimited. – Maanloper Commented Nov 21, 2024 at 13:25
 |  Show 2 more comments

1 Answer 1

Reset to default 1

Use a HASH() to find the depth/level and top node for each observation.

data want;
* Load the cid->parent_cid mapping into a hash ;
  if _n_=1 then do;
    declare hash h(dataset:'have');
    h.definekey('cid');
    h.definedata('parent_cid');
    h.definedone();
  end;
* Load next CID value ;
  set have ;
* Find their top ancestor and how deep they are ;
  do depth=1 by 1 until(h.find());
    top=cid;
    cid=parent_cid;
  end;
* Re-load the same CID value to reset CID and PARENT_CID ;
  set have;
run;

To generate your counters use BY group processing. Make sure to make enough counters for the maximum depth your data has.

proc sort;
  by top depth cid ;
run;

* find how many levels will be needed ;
proc sql noprint;
  select max(depth) into :md trimmed from want;
quit;

* Use FIRST. processing to get the counters ;
* Concatenate to make index string ;
data numbered;
  set want;
  by top depth cid;
  array c[&md] ;
  c[depth]+1;
  if first.depth then do j=depth+1 to &md;
    c[j]=1;
  end;
  drop j ;
  sort_chain=catx('.',of c[*]);
run;

Results

And if we add a new observation CID='h' and PARENT_CID='c' to your example data we will automatically get 4 level sort chain values:

本文标签: