admin管理员组

文章数量:1134246

I do not know enough about M in PowerQuery to work out how to solve the below.

The example is, I have an excel range of over 300 rows, each with 12 columns that may contain strings of meta-data (these are single or two word strings) about an original information source. Many of the 12 columns have null values as not all rows have 12 strings of meta-data. I have included a fake sample of the data to be clear what I'm talking about.

ID Input column 1 Input column 2 Input column 3 Input column 4
1 A B D C
2 G E F
3 A F

I do not know enough about M in PowerQuery to work out how to solve the below.

The example is, I have an excel range of over 300 rows, each with 12 columns that may contain strings of meta-data (these are single or two word strings) about an original information source. Many of the 12 columns have null values as not all rows have 12 strings of meta-data. I have included a fake sample of the data to be clear what I'm talking about.

ID Input column 1 Input column 2 Input column 3 Input column 4
1 A B D C
2 G E F
3 A F

Note: I have deliberately mixed the order of alphabet characters to show that the order does not matter (not a permutation algorithm).

I want like to be able to count the combinations of all column strings in all rows without counting a single column string as it is not a combination, or repeating any column string in a row's possible combinations. So, although order does not matter, repeating any combination must be excluded for the combination to contribute to the counting algorithm. I have tried looking at options on stackoverflow and they don't quite seem to work as they often make repeat combinations of strings.

So if we take the calculation for row ID=1 only, then the desired output would be,

ID Output column 1 Output column 2 Output column 3 Output column 4
1 A B
2 A C
3 A D
4 B C
5 B D
6 C D
7 A B C
8 A B D
9 A C D
10 B C D
11 A B C D

I think that's all of them. Let me know if I missed a unique combination! Solutions sorted alphabetically within a row (see the above table) is preferable to tidy up the results.

The following table is an example of what I don't want. The value of A is being combined with itself as output in both rows ID1 and ID2. Row ID3 is also not permitted, as it is not counting a combination, it is counting a unique value that has not been combined.

ID Output column 1 Output column 2 Output column 3
1 A A
2 A C A
3 A

Solutions in powerquery would be handy as the excel solutions are pretty complicated it seems - I could be wrong though, as I don't know how to solve it.

Share Improve this question asked Jan 8 at 5:29 user29032702user29032702 31 silver badge1 bronze badge New contributor user29032702 is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 6
  • 1 If you have a combination repeated solution, it is maybe easier to remove it, than to create a new one from scratch. – Black cat Commented Jan 8 at 6:01
  • 2 Is the end goal just to count the combinations? – Sam Nseir Commented Jan 8 at 6:06
  • 2 I believe for n unique strings, the number of combinations you want is just 2^n - n -1. – rachel Commented Jan 8 at 9:24
  • "Let me know if I missed a unique combination! " --> does the individual Input column entry Like just 'A' , or just 'B' .. counts as a unique combination? – p._phidot_ Commented Jan 8 at 14:10
  • @p._phidot_ No it does not. He specifically excluded those in his question. – Ron Rosenfeld Commented Jan 8 at 14:24
 |  Show 1 more comment

2 Answers 2

Reset to default 1

One way to do this in powerquery is as follows, which generates a table in a new column for each row

let  Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom", each 
    let L=List.RemoveNulls(List.RemoveFirstN(Record.FieldValues(_),1)),
    N = List.Count(L),
    Subsets =List.Transform({0..Number.Power(2, N)-1},(i) => List.Transform({0..N-1},(j) => if Number.Mod(Number.IntegerDivide(i, Number.Power(2, j)), 2) = 1 then L{j} else null)),
    Concatenate = List.RemoveNulls(List.Transform(Subsets, each Text.Combine(List.RemoveNulls(_), ","))),
    Remove=List.RemoveNulls(List.Transform( Concatenate, each if Text.Contains(_,",") then _ else null)), //remove single combinations
    Convert = Table.FromList(Remove, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    GetNewColNames=List.Transform({1 .. List.Max(List.Transform(Convert[Column1], each List.Count(Text.Split(_,","))))}, each "Column_"& Text.From(_)),
    Split = Table.AddIndexColumn(Table.SplitColumn(Convert, "Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), GetNewColNames)  , "Index", 1, 1, Int64.Type)
 in  Split)
in #"Added Custom"

To implement @rachel algorithm in Power Query, given your data:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, 
        {"Input column 1", type text}, {"Input column 2", type text}, 
        {"Input column 3", type text}, {"Input column 4", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Combination Count", 
        each 
            let 
                n = List.Count(List.RemoveNulls(List.Skip(Record.FieldValues(_))))
            in 
                Number.Power(2,n) - n -1,Int64.Type)
in
    #"Added Custom"

Algorithm for selecting the relevant columns In the Added Custom step:

  • Record.FieldValues(_) returns each value in that row in a List
  • List.Skip then skips the first entry (the ID column)

If the above algorithm doesn't return the relevant columns in your actual data, there are other ways of doing this, either hard-coded or a different algorithm, depending on the nature of your actual data.

As pointed out by @SamNseir, should any of the cells in the row have identical entries, adding List.Distinct to the algorithm in #"Added Custom" would be necessary:

#"Added Custom" = Table.AddColumn(#"Changed Type", "Combination Count", 
        each 
            let 
                n = List.Count(
                        List.RemoveNulls(
                            List.Distinct(
                                List.Skip(Record.FieldValues(_)))))
            in 
                Number.Power(2,n) - n -1,Int64.Type)

Edit
If you want to have both the output columns and the Count as an output, it can be a little tricky as Power Query can only output a single table.

One way to do this in a single table is as shown below, where the input rows are repeated for each row in the output table (along with the Count and ID columns.

I am using a custom function to generate the combinations. So first add the code below as a blank query and name it fnCombos

//https://community.powerbi.com/t5/Community-Blog/Combinatorics-in-Power-Query-part1/bc-p/2221143/highlight/true#M3151
//  See answer of Alexis Olson

//fnCombos
//  combinations of all entries in a single list/column

(L as list) as list =>

let
    N = List.Count(L),
    Subsets =
        List.Transform(
            {0..Number.Power(2, N)-1},
            (i) => List.Transform(
                {0..N-1},
                (j) => if Number.Mod(Number.IntegerDivide(i, Number.Power(2, j)), 2) = 1
                    then L{j}
                    else null
                    )
        ),
  Concatenate = List.Transform(Subsets, each Text.Combine(List.RemoveNulls(_), ","))
in   
    Concatenate

Then use the following for your main code:

  • Be sure to read the code comments for important information
  • Should work with a variable number of input columns, provided they all start with the word Input. If they don't, then we will need some other method of selection.
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    inputCols = List.Select(Table.ColumnNames(Source), each Text.StartsWith(_,"Input")),
    #"Changed Type" = Table.TransformColumnTypes(Source, {{"ID", Int64.Type}} &
         List.Transform(inputCols, each {_, type text})),
    #"Added Counts" = Table.AddColumn(#"Changed Type", "Combination Count", 
        each 
            let 
                n = List.Count(
                        List.RemoveNulls(
                            List.Distinct(
                                Record.FieldValues(Record.SelectFields(_,inputCols)))))
            in 
                Number.Power(2,n) - n -1,Int64.Type),
    #"Added Combos" = Table.AddColumn(#"Added Counts", "Combinations", (r)=>
        let 
            //Select the Input columns only
            //Add List.Distinct if you might have duplicate strings in any of the input columns
            vals = List.RemoveNulls(
                        List.Distinct(Record.FieldValues(
                            Record.SelectFields(r, inputCols)))),                               

            //combos that meet your specifications will all contain a comma.
            combos = List.Select(fnCombos(List.Sort(vals)), each Text.Contains(_,","))

        in 
            Table.FromList(combos,Splitter.SplitTextByDelimiter(","), List.Count(inputCols))),

    outCols = Table.ColumnNames(#"Added Combos"{0}[Combinations]),
    outCols2 = List.Transform(outCols, each "Output " & _),
    #"Expanded Combinations" = Table.ExpandTableColumn(#"Added Combos", "Combinations",
        outCols,outCols2),
   #"Changed Type1" = Table.TransformColumnTypes(#"Expanded Combinations", List.Transform(outCols2, each {_, type text}))
in
    #"Changed Type1"

Results

本文标签: powerqueryCalculate combinations for a range of text values in excelStack Overflow