admin管理员组文章数量:1134246
I do not know enough about M in PowerQuery to work out how to solve the below.
The example is, I have an excel range of over 300 rows, each with 12 columns that may contain strings of meta-data (these are single or two word strings) about an original information source. Many of the 12 columns have null values as not all rows have 12 strings of meta-data. I have included a fake sample of the data to be clear what I'm talking about.
ID | Input column 1 | Input column 2 | Input column 3 | Input column 4 |
---|---|---|---|---|
1 | A | B | D | C |
2 | G | E | F | |
3 | A | F |
I do not know enough about M in PowerQuery to work out how to solve the below.
The example is, I have an excel range of over 300 rows, each with 12 columns that may contain strings of meta-data (these are single or two word strings) about an original information source. Many of the 12 columns have null values as not all rows have 12 strings of meta-data. I have included a fake sample of the data to be clear what I'm talking about.
ID | Input column 1 | Input column 2 | Input column 3 | Input column 4 |
---|---|---|---|---|
1 | A | B | D | C |
2 | G | E | F | |
3 | A | F |
Note: I have deliberately mixed the order of alphabet characters to show that the order does not matter (not a permutation algorithm).
I want like to be able to count the combinations of all column strings in all rows without counting a single column string as it is not a combination, or repeating any column string in a row's possible combinations. So, although order does not matter, repeating any combination must be excluded for the combination to contribute to the counting algorithm. I have tried looking at options on stackoverflow and they don't quite seem to work as they often make repeat combinations of strings.
So if we take the calculation for row ID=1 only, then the desired output would be,
ID | Output column 1 | Output column 2 | Output column 3 | Output column 4 |
---|---|---|---|---|
1 | A | B | ||
2 | A | C | ||
3 | A | D | ||
4 | B | C | ||
5 | B | D | ||
6 | C | D | ||
7 | A | B | C | |
8 | A | B | D | |
9 | A | C | D | |
10 | B | C | D | |
11 | A | B | C | D |
I think that's all of them. Let me know if I missed a unique combination! Solutions sorted alphabetically within a row (see the above table) is preferable to tidy up the results.
The following table is an example of what I don't want. The value of A is being combined with itself as output in both rows ID1 and ID2. Row ID3 is also not permitted, as it is not counting a combination, it is counting a unique value that has not been combined.
ID | Output column 1 | Output column 2 | Output column 3 |
---|---|---|---|
1 | A | A | |
2 | A | C | A |
3 | A |
Solutions in powerquery would be handy as the excel solutions are pretty complicated it seems - I could be wrong though, as I don't know how to solve it.
Share Improve this question asked Jan 8 at 5:29 user29032702user29032702 31 silver badge1 bronze badge New contributor user29032702 is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 6 | Show 1 more comment2 Answers
Reset to default 1One way to do this in powerquery is as follows, which generates a table in a new column for each row
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom", each
let L=List.RemoveNulls(List.RemoveFirstN(Record.FieldValues(_),1)),
N = List.Count(L),
Subsets =List.Transform({0..Number.Power(2, N)-1},(i) => List.Transform({0..N-1},(j) => if Number.Mod(Number.IntegerDivide(i, Number.Power(2, j)), 2) = 1 then L{j} else null)),
Concatenate = List.RemoveNulls(List.Transform(Subsets, each Text.Combine(List.RemoveNulls(_), ","))),
Remove=List.RemoveNulls(List.Transform( Concatenate, each if Text.Contains(_,",") then _ else null)), //remove single combinations
Convert = Table.FromList(Remove, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
GetNewColNames=List.Transform({1 .. List.Max(List.Transform(Convert[Column1], each List.Count(Text.Split(_,","))))}, each "Column_"& Text.From(_)),
Split = Table.AddIndexColumn(Table.SplitColumn(Convert, "Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), GetNewColNames) , "Index", 1, 1, Int64.Type)
in Split)
in #"Added Custom"
To implement @rachel algorithm in Power Query, given your data:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type},
{"Input column 1", type text}, {"Input column 2", type text},
{"Input column 3", type text}, {"Input column 4", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Combination Count",
each
let
n = List.Count(List.RemoveNulls(List.Skip(Record.FieldValues(_))))
in
Number.Power(2,n) - n -1,Int64.Type)
in
#"Added Custom"
Algorithm for selecting the relevant columns
In the Added Custom
step:
Record.FieldValues(_)
returns each value in that row in a ListList.Skip
then skips the first entry (the ID column)
If the above algorithm doesn't return the relevant columns in your actual data, there are other ways of doing this, either hard-coded or a different algorithm, depending on the nature of your actual data.
As pointed out by @SamNseir, should any of the cells in the row have identical entries, adding List.Distinct
to the algorithm in #"Added Custom"
would be necessary:
#"Added Custom" = Table.AddColumn(#"Changed Type", "Combination Count",
each
let
n = List.Count(
List.RemoveNulls(
List.Distinct(
List.Skip(Record.FieldValues(_)))))
in
Number.Power(2,n) - n -1,Int64.Type)
Edit
If you want to have both the output columns and the Count as an output, it can be a little tricky as Power Query can only output a single table.
One way to do this in a single table is as shown below, where the input rows are repeated for each row in the output table (along with the Count and ID columns.
I am using a custom function to generate the combinations. So first add the code below as a blank query and name it fnCombos
//https://community.powerbi.com/t5/Community-Blog/Combinatorics-in-Power-Query-part1/bc-p/2221143/highlight/true#M3151
// See answer of Alexis Olson
//fnCombos
// combinations of all entries in a single list/column
(L as list) as list =>
let
N = List.Count(L),
Subsets =
List.Transform(
{0..Number.Power(2, N)-1},
(i) => List.Transform(
{0..N-1},
(j) => if Number.Mod(Number.IntegerDivide(i, Number.Power(2, j)), 2) = 1
then L{j}
else null
)
),
Concatenate = List.Transform(Subsets, each Text.Combine(List.RemoveNulls(_), ","))
in
Concatenate
Then use the following for your main code:
- Be sure to read the code comments for important information
- Should work with a variable number of input columns, provided they all start with the word
Input
. If they don't, then we will need some other method of selection.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
inputCols = List.Select(Table.ColumnNames(Source), each Text.StartsWith(_,"Input")),
#"Changed Type" = Table.TransformColumnTypes(Source, {{"ID", Int64.Type}} &
List.Transform(inputCols, each {_, type text})),
#"Added Counts" = Table.AddColumn(#"Changed Type", "Combination Count",
each
let
n = List.Count(
List.RemoveNulls(
List.Distinct(
Record.FieldValues(Record.SelectFields(_,inputCols)))))
in
Number.Power(2,n) - n -1,Int64.Type),
#"Added Combos" = Table.AddColumn(#"Added Counts", "Combinations", (r)=>
let
//Select the Input columns only
//Add List.Distinct if you might have duplicate strings in any of the input columns
vals = List.RemoveNulls(
List.Distinct(Record.FieldValues(
Record.SelectFields(r, inputCols)))),
//combos that meet your specifications will all contain a comma.
combos = List.Select(fnCombos(List.Sort(vals)), each Text.Contains(_,","))
in
Table.FromList(combos,Splitter.SplitTextByDelimiter(","), List.Count(inputCols))),
outCols = Table.ColumnNames(#"Added Combos"{0}[Combinations]),
outCols2 = List.Transform(outCols, each "Output " & _),
#"Expanded Combinations" = Table.ExpandTableColumn(#"Added Combos", "Combinations",
outCols,outCols2),
#"Changed Type1" = Table.TransformColumnTypes(#"Expanded Combinations", List.Transform(outCols2, each {_, type text}))
in
#"Changed Type1"
Results
本文标签: powerqueryCalculate combinations for a range of text values in excelStack Overflow
版权声明:本文标题:powerquery - Calculate combinations for a range of text values in excel - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736757297a1951357.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
n
unique strings, the number of combinations you want is just2^n - n -1
. – rachel Commented Jan 8 at 9:24