pandas - In Python, what is the fastest way to append numerical tabular data to a data file? - Stack Overflow

IT技术

更新时间：2025-03-140

admin管理员组
文章数量:1314216

I am trying to write a script that appends new rows of a table to an existing file of tabular data, so that not all progress is lost if an error is encountered. Here are seven different ways to do this in Python:

Write the data to a text file (a module-free method).
Write the data to a CSV file using writerows of the csv module.
Write the data to a database using sqlite3 (or another database library).
Write the data to a CSV file using polars.
Structure the data as a pandas DataFrame and pickle it.
Structure the data as a polars DataFrame and pickle it.
Structure the data as a pandas DataFrame and save it as a Parquet file using fastparquet.

The following code measures a runtime of each of these options, excluding the time required to build the data, header, and any single-row DataFrames. (I have tried to make this code as concise as possible, while also keeping it readable.)

import numpy as np
from time import perf_counter as pc
import csv
import sqlite3
import pandas as pd
import polars as pl
import pickle as pkl
import fastparquet as fp

header = list('ABCDE')
n_rows = 10**3
data = np.random.rand(n_rows, 5)
df_pd = pd.DataFrame(data, columns=header)
df_pl_header = pl.DataFrame(schema=header)
df_pl = pl.DataFrame(data, schema=header)

print('----------------------------------------------------------------------')
print('[no module]')
fname = 'file0.txt'

pc0 = pc()
with open(fname, mode='w') as f:
    f.write('\t'.join(header) + '\n')
    f.flush()
    for k in range(n_rows):
        f.write('\t'.join(map(str, data[k])) + '\n')
        f.flush()
print(f'To write data to file:  {pc()-pc0} sec')

print('----------------------------------------------------------------------')
print('csv')
fname = 'file1.csv'

pc0 = pc()
with open(fname, mode='w', newline='') as f:
    csvwriter = csv.writer(f)
    csvwriter.writerow(header)
    f.flush()
    for row in data:
        csvwriter.writerow(row)
        f.flush()
print(f'To write data to file:  {pc()-pc0} sec')

print('----------------------------------------------------------------------')
print('sqlite3')
fname = 'file2.db'

pc0 = pc()
conn = sqlite3.connect(fname)
cur = conn.cursor()
cur.execute('DROP TABLE IF EXISTS data_table')
cur.execute('CREATE TABLE IF NOT EXISTS data_table (idx integer PRIMARY KEY, A real, B real, C real, D real, E real)')
for k in range(n_rows):
    cur.execute('INSERT INTO data_table VALUES (?, ?, ?, ?, ?, ?)', [k, *data[k]])
connmit()
conn.close()
print(f'To write data to file:  {pc()-pc0} sec')

print('----------------------------------------------------------------------')
print('polars')
fname = 'file3.csv'

pc0 = pc()
with open(fname, mode='w', encoding='utf8') as f:
    df_pl_header.write_csv(f)
    for k in range(n_rows):
        df_pl[k].write_csv(f, include_header=False)
print(f'To write data to file:  {pc()-pc0} sec')

print('----------------------------------------------------------------------')
print('pandas, pickle')
fname = 'file4.pkl'

pc0 = pc()
df_new = df_pd.iloc[[0]]
df_new.to_pickle(fname)
for k in range(1, n_rows):
    df_new = pd.concat([df_new, df_pd.iloc[[k]]], ignore_index=True)
    df_new.to_pickle(fname)
print(f'To write data to file:  {pc()-pc0} sec')

print('----------------------------------------------------------------------')
print('polars, pickle')
fname = 'file5.pkl'

pc0 = pc()
df_new = df_pl[0]
df_ser = pkl.dumps(df_new)
with open(fname, mode='wb') as f:
    f.write(df_ser)
for k in range(1, n_rows):
    df_new = df_new.vstack(df_pl[k])
    df_ser = pkl.dumps(df_new)
    with open(fname, mode='wb') as f:
        f.write(df_ser)
print(f'To write data to file:  {pc()-pc0} sec')

print('----------------------------------------------------------------------')
print('pandas, fastparquet')
fname = 'file6.parquet'

pc0 = pc()
fp.write(fname, df_pd.iloc[[0]])
for k in range(1, n_rows):
    fp.write(fname, df_pd.iloc[[k]], append=True)
print(f'To write data to file:  {pc()-pc0} sec')
print('----------------------------------------------------------------------')

On my machine, running this script yields the following times.

----------------------------------------------------------------------
[no module]
To write data to file:  0.01229649999004323 sec
----------------------------------------------------------------------
csv
To write data to file:  0.011919600001419894 sec
----------------------------------------------------------------------
sqlite3
To write data to file:  0.02571699999680277 sec
----------------------------------------------------------------------
polars
To write data to file:  0.12898200000927318 sec
----------------------------------------------------------------------
pandas, pickle
To write data to file:  0.6281701999978395 sec
----------------------------------------------------------------------
polars, pickle
To write data to file:  1.4392062999977497 sec
----------------------------------------------------------------------
pandas, fastparquet
To write data to file:  33.678610299990396 sec
----------------------------------------------------------------------

So it appears that the first three approaches are much faster than the other four alternatives for this specific task and test. When I increase the size of the table (both in n_rows and number of columns), sqlite3 generally pulls ahead as the fastest option.

But am I missing faster ways of implementing the four slower alternatives or other faster ways of appending numerical data to a tabular file besides those listed?

(I am posting this question in part because I and others I know have encountered this question several times, and we do not find many posted attempts at speed comparisons between the numerous available methods, which are discussed mostly separately online.)

本文标签：

版权声明：本文标题：pandas - In Python, what is the fastest way to append numerical tabular data to a data file? - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1741926443a2405318.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

发表评论

全部评论 0

暂无评论

编程频道|软件玩家 - 软件改变生活！

pandas - In Python, what is the fastest way to append numerical tabular data to a data file? - Stack Overflow

更多相关文章

design - Custom Blog Posts do not come side by side in WordPress

internet explorer - Strange IE error - between JavaScript global variable and element with name attribute - Stack Overflow

javascript - Calling the Date constructor with a Date object - Stack Overflow

javascript - How to clear local storage when deleting user - Stack Overflow

Javascript and TinyMCE (can&#39;t change .value or textarea with TinyMCE) - Stack Overflow

javascript - Babel plugin (Visitor pattern) - How it works - Stack Overflow

javascript - siblings CSS rules with React styled-componentsCSS modulesCSS-in-JS - Stack Overflow

machine learning - Whisper model Real-time endpoint container deployment failed on Azure ML - Stack Overflow

media - How to set the default embed image size

asp.net - Problem with HTML? Scrollbars Not Appearing via window.open - Stack Overflow

javascript - Disallow login to website in multiple browserstabs at the same time - Stack Overflow

ckeditor - Django-CKEditor5 Source Editing Feature Doesn&#39;t Work - Stack Overflow

html - Https and Http for iFrame

Does JavaScript or jQuery have a function similar to Excel&#39;s VLOOKUP? - Stack Overflow

php - No database selected error - symfony pimcore 11 - Stack Overflow

Dynamic load image &amp; Custom Load image using javascript - Stack Overflow

customization - why won&#39;t my custom css load with the enqueue method or any other method?

jquery - How do I get the rows and columns from a tbody in javascript? - Stack Overflow

javascript - Remove json object key value using angular - Stack Overflow

javascript - Disable scrolling on mouse wheel click on Firefox? - Stack Overflow

发表评论

推荐文章

posts - Blog page error &#39;Index of blog&#39;

Correct usage of IndexOf in JavaScript - Stack Overflow

walker - Walker_Nav_Menu doesn&#39;t work in wp_page_menu_args filter

winapi - What is &quot;display file&quot; in Windows and how it differs from metafile? - Stack Overflow

javascript - Event triggering from iframe - Stack Overflow

热门文章

Shortcode, Concatenating &amp; Parse error: syntax error, unexpected T_FOREACH

customization - Best practice to generate token for email action

nlp - Relation Extraction Model returns only one entity instead of entity pairs - Stack Overflow

regex - Javascript regular expression not working in Firefox - Stack Overflow

Editing categories crashes WordPress site

javascript - Include nonce and block count in PyCrypto AES MODE_CTR - Stack Overflow

c# - How to register a .NET 5+ dll to registery as COM correctly and use it from Delphi code? - Stack Overflow

javascript - Axios override, get status code from the data response instead of status - Stack Overflow

openmpi - UCX build with cuda aware - Stack Overflow

javascript - JSON format for Google chart stacked column - Stack Overflow

最新文章

hvv准备ing

Win7各正式版下载地址和SHA验证

怎么样把中文版的Windows7改成英文版的Windows7

Win7系统笔记本蓝牙打开指南：详细步骤助你轻松连接

win7开机弹计算机,win7开机弹出Windows Installer窗口的解决方法

links - Linking to page with all posts

JQuery (UI): Javascript Variable convert to PHP Variable - Stack Overflow

javascript - d3.scaleSequential &quot;interpolator is not a function&quot; - Stack Overflow

javascript - Angular 4 PWA Service Worker doesn&#39;t update when new updates available - Stack Overflow

c# - Visual Studio: Prevent, blacklist or reorder namespace suggestions - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

Javascript and TinyMCE (can't change .value or textarea with TinyMCE) - Stack Overflow

ckeditor - Django-CKEditor5 Source Editing Feature Doesn't Work - Stack Overflow

Does JavaScript or jQuery have a function similar to Excel's VLOOKUP? - Stack Overflow

Dynamic load image & Custom Load image using javascript - Stack Overflow

customization - why won't my custom css load with the enqueue method or any other method?

posts - Blog page error 'Index of blog'

walker - Walker_Nav_Menu doesn't work in wp_page_menu_args filter

winapi - What is "display file" in Windows and how it differs from metafile? - Stack Overflow

Shortcode, Concatenating & Parse error: syntax error, unexpected T_FOREACH

javascript - d3.scaleSequential "interpolator is not a function" - Stack Overflow

javascript - Angular 4 PWA Service Worker doesn't update when new updates available - Stack Overflow