windows - How to create PyTorch tensor immediately in shared memory? - Stack Overflow

IT技术

更新时间：2025-04-192

admin管理员组
文章数量:1410724

I need to create a PyTorch tensor (CPU) and allocate space for it. The tensor is multi-gigabyte and fits in RAM only once.

I need it shared, because it is later utilized by data retrieval workers which are executed in additional spawned processes (DataLoader with num_workers > 0).

I tried several approaches:

Just plain creation and then using DataLoader:

v = torch.empty(25 * 2**30)
loader = DataLoader(dataset, num_workers=2, persistent_workers=True)  
# dataset uses v (actually previous line is its _init__())

Using explicit share_memory_()

v = torch.empty(25 * 2**30)
v.share_memory_()

Both these options lead to Couldn't open shared file mapping:.. because the tensor is created first and then copied to shared memory, and two copies do not fit in RAM.

There is another method with creating a file and then using torch.from_file which probably works, but it requires writing this huge tensor to disk, and it is slow and not desirable.
I have found mentions of TorchStore module which could help, but it seems to be not a part of PyTorch yet.

(based on gfdb answer) Using multiprocessing.shared_memory.SharedMemory():

import torch
import numpy as np
from multiprocessing import shared_memory
import psutil

print(f"Available memory: {psutil.virtual_memory().available / (1024 ** 3):.2f} GB")
print(f"Total memory: {psutil.virtual_memory().total / (1024 ** 3):.2f} GB")

tensor_shape = (2**30, 25)
dtype = np.float32
num_elements = np.prod(tensor_shape)
size = int(num_elements * np.dtype(dtype).itemsize)
print(f"Requesting allocation of {size / 2 ** 30:.2f} GB")

sh_mem = shared_memory.SharedMemory(create=True, size=size)
np_array = np.ndarray(tensor_shape, dtype=dtype, buffer=sh_mem.buf)
tensor = torch.from_numpy(np_array)
tensor.fill_(0)
print(f"Allocated {size / 2**30:.2f} GB")

In the latter case, I still get the error if requesting more than half of RAM (less than half goes OK):

Available memory: 176.63 GB
Total memory: 191.87 GB
Requesting allocation of 100.00 GB
Traceback (most recent call last):
  File "D:\Sci\NetOnset\CheckShared.py", line 22, in 
    sh_mem = shared_memory.SharedMemory(create=True, size=size)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Miniconda3\envs\cuda124\Lib\multiprocessing\shared_memory.py", line 151, in __init__
    self._mmap = mmap.mmap(-1, size, tagname=temp_name)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 1455] The paging file is too small for this operation to complete

Is there still a way to create a big PyTorch Tensor in shared memory without copying?

This all works in the same way both on Windows 10 and Ubuntu 24.04.

I need to create a PyTorch tensor (CPU) and allocate space for it. The tensor is multi-gigabyte and fits in RAM only once.

I need it shared, because it is later utilized by data retrieval workers which are executed in additional spawned processes (DataLoader with num_workers > 0).

I tried several approaches:

Just plain creation and then using DataLoader:

v = torch.empty(25 * 2**30)
loader = DataLoader(dataset, num_workers=2, persistent_workers=True)  
# dataset uses v (actually previous line is its _init__())

Using explicit share_memory_()

v = torch.empty(25 * 2**30)
v.share_memory_()

Both these options lead to Couldn't open shared file mapping:.. because the tensor is created first and then copied to shared memory, and two copies do not fit in RAM.

There is another method with creating a file and then using torch.from_file which probably works, but it requires writing this huge tensor to disk, and it is slow and not desirable.
I have found mentions of TorchStore module which could help, but it seems to be not a part of PyTorch yet.

(based on gfdb answer) Using multiprocessing.shared_memory.SharedMemory():

import torch
import numpy as np
from multiprocessing import shared_memory
import psutil

print(f"Available memory: {psutil.virtual_memory().available / (1024 ** 3):.2f} GB")
print(f"Total memory: {psutil.virtual_memory().total / (1024 ** 3):.2f} GB")

tensor_shape = (2**30, 25)
dtype = np.float32
num_elements = np.prod(tensor_shape)
size = int(num_elements * np.dtype(dtype).itemsize)
print(f"Requesting allocation of {size / 2 ** 30:.2f} GB")

sh_mem = shared_memory.SharedMemory(create=True, size=size)
np_array = np.ndarray(tensor_shape, dtype=dtype, buffer=sh_mem.buf)
tensor = torch.from_numpy(np_array)
tensor.fill_(0)
print(f"Allocated {size / 2**30:.2f} GB")

In the latter case, I still get the error if requesting more than half of RAM (less than half goes OK):

Available memory: 176.63 GB
Total memory: 191.87 GB
Requesting allocation of 100.00 GB
Traceback (most recent call last):
  File "D:\Sci\NetOnset\CheckShared.py", line 22, in 
    sh_mem = shared_memory.SharedMemory(create=True, size=size)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Miniconda3\envs\cuda124\Lib\multiprocessing\shared_memory.py", line 151, in __init__
    self._mmap = mmap.mmap(-1, size, tagname=temp_name)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 1455] The paging file is too small for this operation to complete

Is there still a way to create a big PyTorch Tensor in shared memory without copying?

This all works in the same way both on Windows 10 and Ubuntu 24.04.

Share Improve this question edited Mar 10 at 10:37 asked Mar 5 at 20:31 Stepan Andreenko 1431 silver badge9 bronze badges

What is the error you get in Ubuntu? You are trying to allocate a huge contiguous block of memory, which is likely why this is failing. I suspect that even though you have the memory available, fragmentation is why it fails. Try chunks of memory instead. I'll edit my answer to show what I mean. – gfdb Commented Mar 14 at 22:12
It was something like "Bus Error", I will check again. It works like this: - Allocating more than half of available RAM in "normal" memory - OK. - Allocating less than half of available RAM in "shared" memory - OK. - Allocating more than half of available RAM in "shared" memory - Fail. Yes, I switched to several chunks instead of single Tensor, but it still seems strange. – Stepan Andreenko Commented Mar 15 at 16:04

Add a comment |

1 Answer 1

Sorted by: Reset to default 4

Your question is missing some details but assuming my assumptions are correct:

I get an error Couldn't open shared file mapping:.. when running this code, most likely because the tensor is implicitly being copied to shared memory and second copy does not fit. There is exactly the same error if I call share_memory_() on this tensor explicitly, for the same reason.

This is correct. You will end up with two tensors:

Original CPU tensor (private memory)
Shared-memory tensor (a copy)

And as you say, it won't fit.

One approach besides the file thing could be to use multiprocessing's shared_memory e.g.

import torch
import numpy as np
from multiprocessing import shared_memory

tensor_shape = (1024, 1024, 512)
dtype = np.float32
num_elements = np.prod(tensor_shape)

sh_mem = shared_memory.SharedMemory(create=True, size=num_elements * np.dtype(dtype).itemsize)
np_array = np.ndarray(tensor_shape, dtype=dtype, buffer=sh_mem.buf)

# create tensor without actually copying data
tensor = torch.from_numpy(np_array)

You can do this in chunks, which is probably better e.g.:

chunk_size = 10 * 2**30  # 10GB chunk size 
num_chunks = size // chunk_size

shm_list = [shared_memory.SharedMemory(create=True, size=chunk_size) for _ in range(num_chunks)]

As further proof of no copying, you can check the base pointer of each:

>>> print(np_array.ctypes.data)
133277195173888
>>> print(tensor.data_ptr())
133277195173888

and they should match up.

本文标签： windowsHow to create PyTorch tensor immediately in shared memoryStack Overflow

版权声明：本文标题：windows - How to create PyTorch tensor immediately in shared memory? - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1745009071a2637441.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

windows - How to create PyTorch tensor immediately in shared memory? - Stack Overflow

1 Answer 1

更多相关文章