python - How to speed up the operation of repeating take first n rows for each group after group_by? - Stack Overflow

IT技术

更新时间：2025-01-089

admin管理员组
文章数量:1122832

The df contains 100 millions of rows, and group_by columns is like 25-30. Is there a way to speed this operation up from here? or this is the best I can get.

import polars as pl
import numpy as np

rows = 100000000
n_cols = 30
df = pl.DataFrame(np.random.randint(0, 100, size=(n_cols, rows)), schema=[str(col_i) for col_i in range(n_cols)])
First_n_rows_list = [1,2,3]

df = df.sort('col_0').group_by([“col_”+str(i) for i in range(1, n_cols)])    
result = pl.concat([df.head(First_n_rows).with_columns(pl.lit(First_n_rows).alias('First_n_rows').cast(pl.Int8)) for First_n_rows in First_n_rows_list])

The df contains 100 millions of rows, and group_by columns is like 25-30. Is there a way to speed this operation up from here? or this is the best I can get.

import polars as pl
import numpy as np

rows = 100000000
n_cols = 30
df = pl.DataFrame(np.random.randint(0, 100, size=(n_cols, rows)), schema=[str(col_i) for col_i in range(n_cols)])
First_n_rows_list = [1,2,3]

df = df.sort('col_0').group_by([“col_”+str(i) for i in range(1, n_cols)])    
result = pl.concat([df.head(First_n_rows).with_columns(pl.lit(First_n_rows).alias('First_n_rows').cast(pl.Int8)) for First_n_rows in First_n_rows_list])

Share Improve this question edited yesterday asked yesterday user28199045 133 bronze badges

i am thinking another way to do this, just don't know if this will improve the running time. In each group, get the rank for each row, then duplicate each row max(x_list)-rank times. for example, rank 0 duplicate 3 times, rank 1 duplicate 2 time. – user28199045 Commented yesterday
it's not fully clear what are you trying to achieve here, maybe some more meaningful example with proper (small) input and output would help. But for a start, you can use df.head(max(x_list)) to reduce the size of the dataframe. – roman Commented yesterday
The goal is to get first n rows within each group, add a column showing first_n_value for each first_n_value in the list. Then concat all the rows from different first_n_value case. I edited my description to make it clear. Thanks! – user28199045 Commented yesterday

Add a comment |

2 Answers 2

Sorted by: Reset to default 0

As you said, you can take head(max(x_list)) and then repeat each row appropriate number of times:

x = head(max(x_list))

(
    df.head(x)
    .with_columns(
        pl.int_range(pl.len() + 1, 1, step=-1)
        .over([str(x) for x in range(1,n_cols)]).alias("x")
    )
    .with_columns(pl.exclude("x").repeat_by("x"))
    .explode(pl.exclude("x"))
)

import polars as pl
import numpy as no

n = 50
df = pl.DataFrame(np.random.randint(0, 100, size = (4, n)), schema= ['A', 'B', 'C', 'D'])
x_list = [1, 2, 3]

grouped = df.group_by(['A', 'B', 'C'])
result = pl.concat([grouped.head(x).with_columns(pl.lit(x).alias('x').cast(pl.Int8)) for x in x_list])

本文标签：

版权声明：本文标题：python - How to speed up the operation of repeating take first n rows for each group after group_by? - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736283621a1927025.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

最实用的雨林木风Win10系统推荐与下载指南

编程

1天前

最实用的雨林木风Win10系统推荐与下载指南在操作系统领域，Windows 10凭借其强大的功能、丰富的应用生态以及良好的用户体验，一直深受用户的喜爱。而雨林木风作为知名的系统优化与定制团队，其推出的Win10系统更是以其定制化、简洁性

PC系统安装&引导：5、安装windows系统

编程

1天前

目录 🍅点击这里查看所有博文闲来无事，记录下自己以往多年总结出的一套系统维护的方法。以供有需要的人学习使用。例如，系统崩溃了无法启动怎么办，如何重

PC系统安装&引导：2、安装windows系统维护环境(微PE工具箱)

编程

1天前

colors - How do I create CSS gradients that follow the square root average? - Stack Overflow

IT技术

1天前

This question stems from this minutephysics video I watched a while back: Computer Color is BrokenIt d

在Win10 64位系统上轻松安装Oracle 10g：一份详尽指南

编程

1天前

在Win10 64位系统上轻松安装Oracle 10g：一份详尽指南 win1064位下Oracle10g安装项目地址: https:gitcodeResource-Bundle-Collection6

python 3.x - AWS Lambda code to connect with EKS cluster - Stack Overflow

IT技术

1天前

I have a lambda code in python (v3.13) which is trying to connect to an AWS EKS cluster to run a job. T

Implement while loop inspring webflux to scroll Elasticsearch index and insert to redis - Stack Overflow

IT技术

22小时前

Have developed spring webflux application using java 17 and springboot 3.2,Have implemented an api to

物理网卡MAC修改器v3.0 - 真实网卡硬件MAC地址修改，重装系统不变！

编程

22小时前

物理网卡MAC修改器v3.0 - 真实网卡硬件MAC地址修改，重装系统不变！ 【下载地址】物理网卡MAC修改器v3.0-真实网卡硬件MAC地址修改重装系统不变本仓库提供了一个强大的工具——物理网

Custom Labelling in Multi-Class Classification in XGBoost LightGBM - Stack Overflow

IT技术

22小时前

I have the following dataframe which records the IQ, Hours (number of hours of studying) and Score (pas

swift - Cannot launch maps in CarPlay from my app - Stack Overflow

IT技术

22小时前

In the application I made with Flutter, I integrated CarPlay with the flutter_carplay package. When an

python - How to Call a FastAPI Endpoint on Google App Engine Protected by IAP as an End User? - Stack Overflow

IT技术

21小时前

I have a Python FastAPI application hosted on Google App Engine, and it's protected by Google Iden

python - Mocking imported class set to attribute in constructor with custom init of tested class - Stack Overflow

IT技术

20小时前

I am bothering with some mock while I am testing python class in module, I have provided an example see

Java入门级教学（IDEA的下载与安装与JDK的环境配置）

编程

19小时前

1.JDK的下载与安装 jdk的安装链接分为不同操作系统如下,点击链接跳转下载页面： windows操作系统JDK下载链接(按住键盘ctrl键单击链接即可)： 链接7天有效&#xff

Diagnostic analyzer runner is currently unavailable doe to an internal error (with CodeRush) - Stack Overflow

IT技术

19小时前

In Visual Studio, when I enable CodeRush and build, I see the error message with stack traceStreamJson

c# - Printing Popup Hangs over 5 seconds for each page - Stack Overflow

IT技术

19小时前

Our problem is while printing after the calculations, windows printing popup hangs over 5 seconds then

scalatest - Scala-cli test doesnt exit after test run - Stack Overflow

IT技术

17小时前

I have some basic tests that i am executing with scala cli.When i run the tests scala-cli test core w

ios - Sending "Start" Live Activity Notification from Apple Push Notifications Console successfully received b

IT技术

16小时前

Resorting to asking here since it seems that there's not a lot of documentation around debugging &

If I use a Google Site along with an Apps Script webapp(set to 'Anyone' access)linked to a Google Sheet, is the

IT技术

15小时前

I am trying to save user emails with subscribe button on a webapp made through Google Apps Script with

python - Diffusers pipeline Instant ID with Ipadapter - Stack Overflow

IT技术

14小时前

I want to use an implementation of InstantID with Ipadapter using Diffusers library.So far I got :imp

multithreading - C++ thread exiting without a notice -- need help debugging with gdb - Stack Overflow

IT技术

1小时前

I have a multithreaded program in C++.Here's a brief pseudo-code of the important bits and pieces

发表评论

全部评论 0

暂无评论

编程频道|软件玩家 - 软件改变生活！

python - How to speed up the operation of repeating take first n rows for each group after group_by? - Stack Overflow

2 Answers 2

更多相关文章

最实用的雨林木风Win10系统推荐与下载指南

PC系统安装&amp;引导：5、安装windows系统

PC系统安装&amp;引导：2、安装windows系统维护环境(微PE工具箱)

colors - How do I create CSS gradients that follow the square root average? - Stack Overflow

在Win10 64位系统上轻松安装Oracle 10g：一份详尽指南

python 3.x - AWS Lambda code to connect with EKS cluster - Stack Overflow

Implement while loop inspring webflux to scroll Elasticsearch index and insert to redis - Stack Overflow

物理网卡MAC修改器v3.0 - 真实网卡硬件MAC地址修改，重装系统不变！

Custom Labelling in Multi-Class Classification in XGBoost LightGBM - Stack Overflow

swift - Cannot launch maps in CarPlay from my app - Stack Overflow

python - How to Call a FastAPI Endpoint on Google App Engine Protected by IAP as an End User? - Stack Overflow

python - Mocking imported class set to attribute in constructor with custom init of tested class - Stack Overflow

Java入门级教学（IDEA的下载与安装与JDK的环境配置）

Diagnostic analyzer runner is currently unavailable doe to an internal error (with CodeRush) - Stack Overflow

c# - Printing Popup Hangs over 5 seconds for each page - Stack Overflow

scalatest - Scala-cli test doesnt exit after test run - Stack Overflow

ios - Sending &quot;Start&quot; Live Activity Notification from Apple Push Notifications Console successfully received b

If I use a Google Site along with an Apps Script webapp(set to &#39;Anyone&#39; access)linked to a Google Sheet, is the

python - Diffusers pipeline Instant ID with Ipadapter - Stack Overflow

multithreading - C++ thread exiting without a notice -- need help debugging with gdb - Stack Overflow

发表评论

推荐文章

android - How to restrict background image move up after keyboard open in ionic7, capacitor? - Stack Overflow

login - Issue logging in from second computer

How to add a custom button to each field of a Custom Post Types Admin Screen

Can a user submit requests to wp-adminadmin.php without logging in?

Download Intune custom policy based reports using Graph API - Stack Overflow

热门文章

Data not getting updated via stored procedure in Oracle - Stack Overflow

woocommerce offtopic - Change the Title Tag of Search Products Page

translation - Using wp-cli to create a .pot file that interprets .twig files as well

php - How to add JS script in specific pages in Wordpress?

WordPress editor mobile view crashes, says: Error: Failed to execute &#39;removeChild&#39; on &#39;Node&#39;: Th

android - Google Cloud API does not recognize my Expo SHA-1 fingerprint - Stack Overflow

wp admin - Remove AMPM in gutenberg addedit post and use 24-h format

How to import facebook comments to Wordpress database?

light-wrapper page-title

.net - Cloudflare R2 Access Denied and Signature v2-v4 Problem - Stack Overflow

最新文章

Java入门级教学（IDEA的下载与安装与JDK的环境配置）

华硕笔记本电脑用U盘重装windows系统

物理网卡MAC修改器v3.0 - 真实网卡硬件MAC地址修改，重装系统不变！

如何一键安装win7系统(一键安装win7系统步骤)

Windows 11最稳定版本详解

multithreading - C++ thread exiting without a notice -- need help debugging with gdb - Stack Overflow

apache kafka - Unknown feature gate KafkaNodePools found in the configuration - Stack Overflow

New Python Instance in VS Code and the terminal is passing indentions that do not exist in the code editor window - Stack Overfl

ros2 - how to modify imu_filter_madgwick to transform RPY from imu_sensor frame to base_link frame? - Stack Overflow

Color a portion of a minipage in Manim - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

PC系统安装&引导：5、安装windows系统

PC系统安装&引导：2、安装windows系统维护环境(微PE工具箱)

ios - Sending "Start" Live Activity Notification from Apple Push Notifications Console successfully received b

If I use a Google Site along with an Apps Script webapp(set to 'Anyone' access)linked to a Google Sheet, is the

WordPress editor mobile view crashes, says: Error: Failed to execute 'removeChild' on 'Node': Th