admin管理员组文章数量:1410724
I need to create a PyTorch tensor (CPU) and allocate space for it. The tensor is multi-gigabyte and fits in RAM only once.
I need it shared, because it is later utilized by data retrieval workers which are executed in additional spawned processes (DataLoader with num_workers > 0
).
I tried several approaches:
Just plain creation and then using
DataLoader
:v = torch.empty(25 * 2**30) loader = DataLoader(dataset, num_workers=2, persistent_workers=True) # dataset uses v (actually previous line is its _init__())
Using explicit
share_memory_()
v = torch.empty(25 * 2**30) v.share_memory_()
Both these options lead to Couldn't open shared file mapping:..
because the tensor is created first and then copied to shared memory,
and two copies do not fit in RAM.
There is another method with creating a file and then using
torch.from_file
which probably works, but it requires writing this huge tensor to disk, and it is slow and not desirable.I have found mentions of
TorchStore
module which could help, but it seems to be not a part of PyTorch yet.(based on gfdb answer) Using multiprocessing.shared_memory.SharedMemory():
import torch import numpy as np from multiprocessing import shared_memory import psutil print(f"Available memory: {psutil.virtual_memory().available / (1024 ** 3):.2f} GB") print(f"Total memory: {psutil.virtual_memory().total / (1024 ** 3):.2f} GB") tensor_shape = (2**30, 25) dtype = np.float32 num_elements = np.prod(tensor_shape) size = int(num_elements * np.dtype(dtype).itemsize) print(f"Requesting allocation of {size / 2 ** 30:.2f} GB") sh_mem = shared_memory.SharedMemory(create=True, size=size) np_array = np.ndarray(tensor_shape, dtype=dtype, buffer=sh_mem.buf) tensor = torch.from_numpy(np_array) tensor.fill_(0) print(f"Allocated {size / 2**30:.2f} GB")
In the latter case, I still get the error if requesting more than half of RAM (less than half goes OK):
Available memory: 176.63 GB Total memory: 191.87 GB Requesting allocation of 100.00 GB Traceback (most recent call last): File "D:\Sci\NetOnset\CheckShared.py", line 22, in sh_mem = shared_memory.SharedMemory(create=True, size=size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Miniconda3\envs\cuda124\Lib\multiprocessing\shared_memory.py", line 151, in __init__ self._mmap = mmap.mmap(-1, size, tagname=temp_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: [WinError 1455] The paging file is too small for this operation to complete
Is there still a way to create a big PyTorch Tensor in shared memory without copying?
This all works in the same way both on Windows 10 and Ubuntu 24.04.
I need to create a PyTorch tensor (CPU) and allocate space for it. The tensor is multi-gigabyte and fits in RAM only once.
I need it shared, because it is later utilized by data retrieval workers which are executed in additional spawned processes (DataLoader with num_workers > 0
).
I tried several approaches:
Just plain creation and then using
DataLoader
:v = torch.empty(25 * 2**30) loader = DataLoader(dataset, num_workers=2, persistent_workers=True) # dataset uses v (actually previous line is its _init__())
Using explicit
share_memory_()
v = torch.empty(25 * 2**30) v.share_memory_()
Both these options lead to Couldn't open shared file mapping:..
because the tensor is created first and then copied to shared memory,
and two copies do not fit in RAM.
There is another method with creating a file and then using
torch.from_file
which probably works, but it requires writing this huge tensor to disk, and it is slow and not desirable.I have found mentions of
TorchStore
module which could help, but it seems to be not a part of PyTorch yet.(based on gfdb answer) Using multiprocessing.shared_memory.SharedMemory():
import torch import numpy as np from multiprocessing import shared_memory import psutil print(f"Available memory: {psutil.virtual_memory().available / (1024 ** 3):.2f} GB") print(f"Total memory: {psutil.virtual_memory().total / (1024 ** 3):.2f} GB") tensor_shape = (2**30, 25) dtype = np.float32 num_elements = np.prod(tensor_shape) size = int(num_elements * np.dtype(dtype).itemsize) print(f"Requesting allocation of {size / 2 ** 30:.2f} GB") sh_mem = shared_memory.SharedMemory(create=True, size=size) np_array = np.ndarray(tensor_shape, dtype=dtype, buffer=sh_mem.buf) tensor = torch.from_numpy(np_array) tensor.fill_(0) print(f"Allocated {size / 2**30:.2f} GB")
In the latter case, I still get the error if requesting more than half of RAM (less than half goes OK):
Available memory: 176.63 GB Total memory: 191.87 GB Requesting allocation of 100.00 GB Traceback (most recent call last): File "D:\Sci\NetOnset\CheckShared.py", line 22, in sh_mem = shared_memory.SharedMemory(create=True, size=size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Miniconda3\envs\cuda124\Lib\multiprocessing\shared_memory.py", line 151, in __init__ self._mmap = mmap.mmap(-1, size, tagname=temp_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: [WinError 1455] The paging file is too small for this operation to complete
Is there still a way to create a big PyTorch Tensor in shared memory without copying?
This all works in the same way both on Windows 10 and Ubuntu 24.04.
Share Improve this question edited Mar 10 at 10:37 Stepan Andreenko asked Mar 5 at 20:31 Stepan AndreenkoStepan Andreenko 1431 silver badge9 bronze badges 2- What is the error you get in Ubuntu? You are trying to allocate a huge contiguous block of memory, which is likely why this is failing. I suspect that even though you have the memory available, fragmentation is why it fails. Try chunks of memory instead. I'll edit my answer to show what I mean. – gfdb Commented Mar 14 at 22:12
- It was something like "Bus Error", I will check again. It works like this: - Allocating more than half of available RAM in "normal" memory - OK. - Allocating less than half of available RAM in "shared" memory - OK. - Allocating more than half of available RAM in "shared" memory - Fail. Yes, I switched to several chunks instead of single Tensor, but it still seems strange. – Stepan Andreenko Commented Mar 15 at 16:04
1 Answer
Reset to default 4Your question is missing some details but assuming my assumptions are correct:
I get an error
Couldn't open shared file mapping:..
when running this code, most likely because the tensor is implicitly being copied to shared memory and second copy does not fit. There is exactly the same error if I callshare_memory_()
on this tensor explicitly, for the same reason.
This is correct. You will end up with two tensors:
- Original CPU tensor (private memory)
- Shared-memory tensor (a copy)
And as you say, it won't fit.
One approach besides the file thing could be to use multiprocessing's shared_memory e.g.
import torch
import numpy as np
from multiprocessing import shared_memory
tensor_shape = (1024, 1024, 512)
dtype = np.float32
num_elements = np.prod(tensor_shape)
sh_mem = shared_memory.SharedMemory(create=True, size=num_elements * np.dtype(dtype).itemsize)
np_array = np.ndarray(tensor_shape, dtype=dtype, buffer=sh_mem.buf)
# create tensor without actually copying data
tensor = torch.from_numpy(np_array)
You can do this in chunks, which is probably better e.g.:
chunk_size = 10 * 2**30 # 10GB chunk size
num_chunks = size // chunk_size
shm_list = [shared_memory.SharedMemory(create=True, size=chunk_size) for _ in range(num_chunks)]
As further proof of no copying, you can check the base pointer of each:
>>> print(np_array.ctypes.data)
133277195173888
>>> print(tensor.data_ptr())
133277195173888
and they should match up.
本文标签: windowsHow to create PyTorch tensor immediately in shared memoryStack Overflow
版权声明:本文标题:windows - How to create PyTorch tensor immediately in shared memory? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1745009071a2637441.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论