admin管理员组文章数量:1304081
Using Python streamz and dask, I want to distribute the data of textfiles that are generated to threads. Which then will process every newline generated inside those files.
from streamz import Stream
source = Stream.filenames(r'C:\Documents\DataFiles\*.txt')
# recv = source.scatter().map(print)
recv = source.map(Stream.from_textfile)
recv.start()
In the above code how will I get the data coming from the streamz of the text files? I access and make dataframes for each text files like this. But cannot join this stream of executions nor use a dask distributed to scatter the work.
from streamz import Stream
recv = Stream.from_textfile(r'C:\Users\DataFiles\stream_output.txt')
import pandas as pd
dataframe = recv.partition(10).map(pd.DataFrame)
本文标签: daskUse Python streamz to parallel process many realtime updating filesStack Overflow
版权声明:本文标题:dask - Use Python streamz to parallel process many realtime updating files? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741780818a2397297.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论