admin管理员组文章数量:1125429
Our team is looking to use the pants build system which conveniently packages python code into a PEX with only the required dependent packages. I couldn't find any documentation however about how a PEX would work in a Beam + Dataflow job.
We currently pip install
all of our dependencies and pip install -e
our source code into the SDK image docker container like so:
FROM python:3.11-slim
COPY --from=apache/beam_python3.11_sdk:2.61.0 /opt/apache/beam /opt/apache/beam
COPY --from=gcr.io/dataflow-templates-base/python311-template-launcher-base:20241127-rc00 /opt/google/dataflow/python_template_launcher /opt/google/dataflow/python_template_launcher
COPY repo repo
RUN pip install (with lots of flags) -r repo/requirements.txt
RUN pip install -e repo
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="/repo/pipeline.py"
ENTRYPOINT ["/opt/apache/beam/boot"]`
and then run that all as a "Flex template" on Dataflow.
I was having a hard time figuring out how I would adapt this to a PEX as we never invoke python itself in our Dockerfile.
本文标签: How to run a PEX on Apache BeamGCP DataflowStack Overflow
版权声明:本文标题:How to run a PEX on Apache Beam + GCP Dataflow? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736659330a1946338.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论