admin管理员组文章数量:1221377
I have a DataFrame df_items and want to create combinations of its rows of size i using itertoolsbinations. Each combination should maintain all columns from the original DataFrame.
Current approach: works but loses column names
from itertools import combinations
combinations = np.array(list(combinations(range(len(df_items)), i)))
selected_items = df_items.values[combinations]
I have a DataFrame df_items and want to create combinations of its rows of size i using itertools.combinations. Each combination should maintain all columns from the original DataFrame.
Current approach: works but loses column names
from itertools import combinations
combinations = np.array(list(combinations(range(len(df_items)), i)))
selected_items = df_items.values[combinations]
Share
Improve this question
asked Feb 7 at 18:24
A AA A
1
New contributor
A A is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
3
|
1 Answer
Reset to default 0If you want independent DataFrames for each combination of rows, the best is to use iloc
in a loop:
for c in combinations(range(len(df_items)), 2):
print(df_items.iloc[list(c)])
Example output:
A B
0 0 0
1 1 1
A B
0 0 0
2 2 2
A B
1 1 1
2 2 2
Used input:
df_items = pd.DataFrame({'A': range(3),
'B': range(3)})
You could also groupby
but this will be less efficient:
from itertools import combinations, chain
i = 2
tmp = df_items.iloc[list(chain.from_iterable(combinations(range(len(df_items)), i)))]
tmp.groupby(np.arange(len(tmp))//i)
版权声明:本文标题:dataframe - How to select rows based on combinations while preserving column names in pandas? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1739298034a2157003.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
groupby
as this is the shortest way I often use to find the list of all possible combinations. For example mydf
has colsemployee
andcustomer_id
, then if I want to find all the combination of those two factors, I justdf.groupby(['employee', 'customer_id'])['var'].size()
Hope this helps – PTQuoc Commented Feb 7 at 18:27