admin管理员组文章数量:1278947
I am trying to understand how Google Cloud Dataflow costs when reading a file with beam.io.ReadFromText
. From my understanding, every time something is read from a Google Cloud bucket, it incurs charges per 1000 operations. I am trying to understand however, if I had hypothetically 10 billion rows in a file (or any sorts of small rows but a lot of them), would this incur charges in the millions of dollars for a simple filtering or does DataFlow only charge a single request to stage the target file into a "free" environment (or does it batch in some way?)?
Edit: I am referring to the class B operations on the cloud bucket performed. Talking to a cloud representative, it looks like it does read it in chunks but not even they know how much those could remotely be
I am trying to understand how Google Cloud Dataflow costs when reading a file with beam.io.ReadFromText
. From my understanding, every time something is read from a Google Cloud bucket, it incurs charges per 1000 operations. I am trying to understand however, if I had hypothetically 10 billion rows in a file (or any sorts of small rows but a lot of them), would this incur charges in the millions of dollars for a simple filtering or does DataFlow only charge a single request to stage the target file into a "free" environment (or does it batch in some way?)?
Edit: I am referring to the class B operations on the cloud bucket performed. Talking to a cloud representative, it looks like it does read it in chunks but not even they know how much those could remotely be
Share Improve this question edited Feb 24 at 22:35 Giancarlo Metitieri asked Feb 24 at 17:58 Giancarlo MetitieriGiancarlo Metitieri 1356 bronze badges 01 Answer
Reset to default 0The cost of a Dataflow job using beam.io.ReadFromText
to process a 10-billion-row file isn't simply a matter of "charges per 1000 operations." The pricing depends on several interacting factors.
Dataflow doesn't stage the entire file into a "free" environment. The processing happens in parallel across multiple workers and the data is read and processed in chunks, not as a single monolithic operation. But processing 10 billion rows will definitely consume significant compute resources and it charges based on the amount of time your worker nodes run and the resources (CPU, memory, and persistent disk space) they consume.
It's unlikely to be in the millions of dollars for a simple filter if the job is appropriately configured for parallel processing and uses a cost-effective worker type.
本文标签: apache beamHow does DataFlow charge for read operations from Cloud StorageStack Overflow
版权声明:本文标题:apache beam - How does DataFlow charge for read operations from Cloud Storage - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741249442a2365517.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论