admin管理员组

文章数量:1410724

I need to create a PyTorch tensor (CPU) and allocate space for it. The tensor is multi-gigabyte and fits in RAM only once.

I need it shared, because it is later utilized by data retrieval workers which are executed in additional spawned processes (DataLoader with num_workers > 0).

I tried several approaches:

  1. Just plain creation and then using DataLoader:

    v = torch.empty(25 * 2**30)
    loader = DataLoader(dataset, num_workers=2, persistent_workers=True)  
    # dataset uses v (actually previous line is its _init__())
    
  2. Using explicit share_memory_()

    v = torch.empty(25 * 2**30)
    v.share_memory_()

Both these options lead to Couldn't open shared file mapping:.. because the tensor is created first and then copied to shared memory, and two copies do not fit in RAM.

  1. There is another method with creating a file and then using torch.from_file which probably works, but it requires writing this huge tensor to disk, and it is slow and not desirable.

  2. I have found mentions of TorchStore module which could help, but it seems to be not a part of PyTorch yet.

  3. (based on gfdb answer) Using multiprocessing.shared_memory.SharedMemory():

    import torch
    import numpy as np
    from multiprocessing import shared_memory
    import psutil
    
    print(f"Available memory: {psutil.virtual_memory().available / (1024 ** 3):.2f} GB")
    print(f"Total memory: {psutil.virtual_memory().total / (1024 ** 3):.2f} GB")
    
    tensor_shape = (2**30, 25)
    dtype = np.float32
    num_elements = np.prod(tensor_shape)
    size = int(num_elements * np.dtype(dtype).itemsize)
    print(f"Requesting allocation of {size / 2 ** 30:.2f} GB")
    
    sh_mem = shared_memory.SharedMemory(create=True, size=size)
    np_array = np.ndarray(tensor_shape, dtype=dtype, buffer=sh_mem.buf)
    tensor = torch.from_numpy(np_array)
    tensor.fill_(0)
    print(f"Allocated {size / 2**30:.2f} GB")

In the latter case, I still get the error if requesting more than half of RAM (less than half goes OK):

Available memory: 176.63 GB
Total memory: 191.87 GB
Requesting allocation of 100.00 GB
Traceback (most recent call last):
  File "D:\Sci\NetOnset\CheckShared.py", line 22, in 
    sh_mem = shared_memory.SharedMemory(create=True, size=size)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Miniconda3\envs\cuda124\Lib\multiprocessing\shared_memory.py", line 151, in __init__
    self._mmap = mmap.mmap(-1, size, tagname=temp_name)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 1455] The paging file is too small for this operation to complete

Is there still a way to create a big PyTorch Tensor in shared memory without copying?

This all works in the same way both on Windows 10 and Ubuntu 24.04.

I need to create a PyTorch tensor (CPU) and allocate space for it. The tensor is multi-gigabyte and fits in RAM only once.

I need it shared, because it is later utilized by data retrieval workers which are executed in additional spawned processes (DataLoader with num_workers > 0).

I tried several approaches:

  1. Just plain creation and then using DataLoader:

    v = torch.empty(25 * 2**30)
    loader = DataLoader(dataset, num_workers=2, persistent_workers=True)  
    # dataset uses v (actually previous line is its _init__())
    
  2. Using explicit share_memory_()

    v = torch.empty(25 * 2**30)
    v.share_memory_()

Both these options lead to Couldn't open shared file mapping:.. because the tensor is created first and then copied to shared memory, and two copies do not fit in RAM.

  1. There is another method with creating a file and then using torch.from_file which probably works, but it requires writing this huge tensor to disk, and it is slow and not desirable.

  2. I have found mentions of TorchStore module which could help, but it seems to be not a part of PyTorch yet.

  3. (based on gfdb answer) Using multiprocessing.shared_memory.SharedMemory():

    import torch
    import numpy as np
    from multiprocessing import shared_memory
    import psutil
    
    print(f"Available memory: {psutil.virtual_memory().available / (1024 ** 3):.2f} GB")
    print(f"Total memory: {psutil.virtual_memory().total / (1024 ** 3):.2f} GB")
    
    tensor_shape = (2**30, 25)
    dtype = np.float32
    num_elements = np.prod(tensor_shape)
    size = int(num_elements * np.dtype(dtype).itemsize)
    print(f"Requesting allocation of {size / 2 ** 30:.2f} GB")
    
    sh_mem = shared_memory.SharedMemory(create=True, size=size)
    np_array = np.ndarray(tensor_shape, dtype=dtype, buffer=sh_mem.buf)
    tensor = torch.from_numpy(np_array)
    tensor.fill_(0)
    print(f"Allocated {size / 2**30:.2f} GB")

In the latter case, I still get the error if requesting more than half of RAM (less than half goes OK):

Available memory: 176.63 GB
Total memory: 191.87 GB
Requesting allocation of 100.00 GB
Traceback (most recent call last):
  File "D:\Sci\NetOnset\CheckShared.py", line 22, in 
    sh_mem = shared_memory.SharedMemory(create=True, size=size)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Miniconda3\envs\cuda124\Lib\multiprocessing\shared_memory.py", line 151, in __init__
    self._mmap = mmap.mmap(-1, size, tagname=temp_name)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 1455] The paging file is too small for this operation to complete

Is there still a way to create a big PyTorch Tensor in shared memory without copying?

This all works in the same way both on Windows 10 and Ubuntu 24.04.

Share Improve this question edited Mar 10 at 10:37 Stepan Andreenko asked Mar 5 at 20:31 Stepan AndreenkoStepan Andreenko 1431 silver badge9 bronze badges 2
  • What is the error you get in Ubuntu? You are trying to allocate a huge contiguous block of memory, which is likely why this is failing. I suspect that even though you have the memory available, fragmentation is why it fails. Try chunks of memory instead. I'll edit my answer to show what I mean. – gfdb Commented Mar 14 at 22:12
  • It was something like "Bus Error", I will check again. It works like this: - Allocating more than half of available RAM in "normal" memory - OK. - Allocating less than half of available RAM in "shared" memory - OK. - Allocating more than half of available RAM in "shared" memory - Fail. Yes, I switched to several chunks instead of single Tensor, but it still seems strange. – Stepan Andreenko Commented Mar 15 at 16:04
Add a comment  | 

1 Answer 1

Reset to default 4

Your question is missing some details but assuming my assumptions are correct:

I get an error Couldn't open shared file mapping:.. when running this code, most likely because the tensor is implicitly being copied to shared memory and second copy does not fit. There is exactly the same error if I call share_memory_() on this tensor explicitly, for the same reason.

This is correct. You will end up with two tensors:

  1. Original CPU tensor (private memory)
  2. Shared-memory tensor (a copy)

And as you say, it won't fit.

One approach besides the file thing could be to use multiprocessing's shared_memory e.g.

import torch
import numpy as np
from multiprocessing import shared_memory

tensor_shape = (1024, 1024, 512)
dtype = np.float32
num_elements = np.prod(tensor_shape)

sh_mem = shared_memory.SharedMemory(create=True, size=num_elements * np.dtype(dtype).itemsize)
np_array = np.ndarray(tensor_shape, dtype=dtype, buffer=sh_mem.buf)

# create tensor without actually copying data
tensor = torch.from_numpy(np_array)

You can do this in chunks, which is probably better e.g.:

chunk_size = 10 * 2**30  # 10GB chunk size 
num_chunks = size // chunk_size

shm_list = [shared_memory.SharedMemory(create=True, size=chunk_size) for _ in range(num_chunks)]

As further proof of no copying, you can check the base pointer of each:

>>> print(np_array.ctypes.data)
133277195173888
>>> print(tensor.data_ptr())
133277195173888

and they should match up.

本文标签: windowsHow to create PyTorch tensor immediately in shared memoryStack Overflow