admin管理员组

文章数量:1241152

I want to measure the memory consumption of a Python list, where each list member is a tuple, and each tuple has three Pytorch tensors, two being dense and the other being COO sparse.

I know sys.getsizeof(). But it doesn't work.

I want a convenient way to get the memory size of the whole list.

I want to measure the memory consumption of a Python list, where each list member is a tuple, and each tuple has three Pytorch tensors, two being dense and the other being COO sparse.

I know sys.getsizeof(). But it doesn't work.

I want a convenient way to get the memory size of the whole list.

Share Improve this question asked 2 days ago rozroz 391 silver badge1 bronze badge 1
  • Do you want to do this only occasionally? If so then pickle it and inspect the result. Otherwise try pympler. – JonSG Commented 2 days ago
Add a comment  | 

3 Answers 3

Reset to default 0

sys.getsizeof() alone does not capture nested object memory, so you need to use a combination of sys.getsizeof(), tensor.element_size(), and handling PyTorch-specific behaviors. Also, you can use torch.cuda.memory_allocated() but need GPU.

def get_tensor_memory_size(tensor):
    """Calculate memory size of a PyTorch tensor."""
    if tensor.is_sparse:
        # Ensure the sparse tensor is coalesced to have proper indices/values
        if not tensor.is_coalesced():
            tensor = tensor.coalesce()
        # Memory for sparse tensor: size of indices (ignoring values for simplicity)
        return tensor.indices().element_size() * tensor.indices().nelement()
    else:
        # Memory for dense tensor
        return tensor.element_size() * tensor.nelement()

def get_list_memory_size(lst):
    """Calculate total memory size of a list of tuples containing tensors."""
    total_size = sys.getsizeof(lst)
    for tup in lst:
        total_size += sys.getsizeof(tup)
        for tensor in tup:
            total_size += get_tensor_memory_size(tensor)
    return total_size

Please see the example here

Why sys.getsizeof() won't work

sys.getsizeof() just returns the size of the object itself, not what it is referring to. i.e. if you had a box with stuff in it, sys.getsizeof() will tell you how big the box is, but not what's inside. You could have a box of packing peanuts or a box of tungsten cubes, we need to know more what's inside the box. Note this is not a tensor specific issue, but for any container object (list tuples, np arrays, tensors, etc.)

What will work to get the true memory size of tensor/container objects

For a dense tensor, it's pretty easy. Just take the number of elements times the size of one element. dense_mem_size = tensor.nelement() * tensor.element_size() Where tensor.nelement() gives the number of items in the tensor, and tensor.element_size() gets the size of one of those elements.

For sparse coordinate tensors, same idea but now consider the values and indices of the coo tensor:

vals_size = coo_tensor._values().nelement() * coo_tensor._values().element_size()
idxs_size = coo_tensor._indices().nelement() * coo_tensor._indices().element_size()

sparse_mem_size = vals_size + idxs_size 

Putting this together to get a function for your data's structure

We will just have a function that takes an object and recursively calls itself to go deeper

def deep_getsizeof(obj, seen=None):    
    # seen checks for objects referenced in multiple places to avoid double counting
    if seen is None:
        seen = set()
        
    obj_id = id(obj)
    if obj_id in seen:
        return 0 
    seen.add(obj_id)
    
    size = 0
    if isinstance(obj, torch.Tensor):
        # start with the shallow size of the tensor object.
        size += sys.getsizeof(obj)
        
        if obj.is_sparse: # sparse tensor
            idxs = obj._indices()
            vals = obj._values()
            size += deep_getsizeof(indices, seen)
            size += deep_getsizeof(values, seen)

        else: # dense tensor
            size += obj.nelement() * obj.element_size()
        return size
    
    # For lists and tuples, include the size of each element.
    if isinstance(obj, (list, tuple)):
        size += sys.getsizeof(obj)
        for item in obj:
            size += deep_getsizeof(item, seen)
        return size
    
    # base case return size of item if not one of the container objects you mentioned
    return sys.getsizeof(obj)

You said that sys.getsizeof() function doesn't work for you, May I get more info on that. Here is a convenient way to get the memory usage using pympler.

Have a look at its implementation.

import sys
a_random_list = [1, 2, 3, 4, 5]
memory_usage = sys.getsizeof(a_random_list)
print(f"Memory usage of the list: {memory_usage} bytes")

If you don't wish to use the sys.getsizeof() function, you can use this

from pympler import asizeof
# U can use Pympler to get more accurate info and open compatibility for larger types of datasets.

my_list = [1, 2, [3, 4, 5], {'a': 6, 'b': 7}]

memory_usage = asizeof.asizeof(my_list)

print(f"Total memory usage of the list: {memory_usage} bytes")

To use this, you would have to install pympler. Use command: pip install pympler

:)

本文标签: pytorchHow to check the memory consumption of a Python objectStack Overflow