admin管理员组

文章数量:1122832

I need to produce a python program to save a large amount of vectors on disk. I receive the data as fixed-length list[float].

I have several idea regarding the method to save them in memory.

using json :

import json

def save_to_json(data, file_name):
    with open(file_name, 'w') as file:
        json.dump(data, file)

using built-in array:

import array

def save_with_array(data, file_name):
    arr = array.array('d', data)
    with open(file_name, 'wb') as file:
        arr.tofile(file)

or using numpy:

import numpy as np

def save_with_numpy(data, file_name):
    np_array = np.array(data, dtype=np.float64)
    np_array.tofile(file_name)

The ideal method is the fastest, and if it's low on memory, it's a plus.

From my tests, the binary file produced by numpy and array are identical, so I guess they use the same process under the hood, but maybe one is faster than the other. Regarding json, this was my first idea, but I feel it is not suited for this use case.

Any idea how to optimize the process ?

本文标签: python 3xWhich one to chose among array library and numpy to write to diskStack Overflow