admin管理员组文章数量:1297256
I'm currently assisting on a project dealing with large climate datasets. One of our processing steps involves reading in netCDF files in chunks to prevent using too much RAM. The larger the chunk size, the faster the processing step goes.
The idea is to create a script that can be run using modest resources (16gb RAM, 4 core CPU, personal computer/laptop) in a reasonable time frame.
My lead believes that we shouldn't use more than 30% of available memory, and wants to limit chunk size to accommodate this. I've been doing some googling, but other than Google's AI response (whose source links don't seem to have this information) I can't find anything about the maximum amount of memory, or this 30% rule. Does anyone have any insight on what a reasonable limit would be, or resources where to learn more about this?
I'm currently assisting on a project dealing with large climate datasets. One of our processing steps involves reading in netCDF files in chunks to prevent using too much RAM. The larger the chunk size, the faster the processing step goes.
The idea is to create a script that can be run using modest resources (16gb RAM, 4 core CPU, personal computer/laptop) in a reasonable time frame.
My lead believes that we shouldn't use more than 30% of available memory, and wants to limit chunk size to accommodate this. I've been doing some googling, but other than Google's AI response (whose source links don't seem to have this information) I can't find anything about the maximum amount of memory, or this 30% rule. Does anyone have any insight on what a reasonable limit would be, or resources where to learn more about this?
Share Improve this question asked Jan 23 at 18:28 deathcon501deathcon501 132 bronze badges 4- 2 It depends on what else the machine is being used for. If this is the only task, you can use as much memory as you need. – Barmar Commented Jan 23 at 18:33
- 1 1. always let the user control it. 2. no rule, just don't let the system run out of memory. 3. remember the user can open new applications while your application is running, so just querying the available memory ahead of time is pointless, give the user control over it. – Ahmed AEK Commented Jan 23 at 19:04
- 1 benchmark the performance of each size, there is a point where there won't be any performance gain from increasing the size anymore (which varies according to disk speed and RAM speed and CPU speed), use a good default and let the user override it. – Ahmed AEK Commented Jan 23 at 19:06
- Three things to consider. 1. "unused RAM" is used by the OS for its page-cache. Depending on your use case, you may want to keep enough in reserve to facilitate IO. 2. Leave enough reserve for spikes that usually occur when dynamic data structures relocate to grow or during IO. 3. Try testing much smaller chunk sizes that fit into the CPU cache and thereby profit from much higher memory throughput and lower latency – Homer512 Commented Jan 24 at 13:41
1 Answer
Reset to default 0On the same thread as commenter @Barmar : unused RAM is wasted RAM.
There is no significant impact to general performance between a device with most of its RAM unused, and some of its RAM unused. In this regard, what matters is that you are not approaching or exceeding all of your RAM used. As long as the systems are not doing so, it is fine.
This does not necessarily give you an answer, though. You need to understand the systems it will be deployed on, and the processes you expect to them to be doing, to inform your decision.
So, if the systems running the memory-hungry script is solely used for that purpose, then there is no reason to keep excess memory free. While running the script, there will be no other processes using up memory, and so you essentially can know empirically how much RAM you can get away with.
However, if the systems are not only for data processing, you need to factor in the potential for processes outside of your own, and subtract that away from you allowance. For example, if you expect to be running this on someone's personal laptop, you have to account for the possibility that that person may open up chrome, play a game, have an application auto-update, or all of those at the same time. Ignoring this will not only make the device unusable for a period, but likely cause overflow errors which may halt or even effect the validity of your data processing. In this case, it may be beneficial to be conservative with your usage. One might see how much RAM the processes they expect to run concurrently might take up, and subtract that from their allowance.
TLDR:
Unused RAM is wasted RAM. If it is the only thing being ran on the device, use as much RAM as you have. If not, conserve RAM for expected tasks, based on the actual usage of those tasks.
本文标签:
版权声明:本文标题:python - Is there a general 'rule of thumb' for the maximum percentage of memory a data processing task should u 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1738474288a2088770.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论