admin管理员组

文章数量:1125429

Our team is looking to use the pants build system which conveniently packages python code into a PEX with only the required dependent packages. I couldn't find any documentation however about how a PEX would work in a Beam + Dataflow job.

We currently pip install all of our dependencies and pip install -e our source code into the SDK image docker container like so:

FROM python:3.11-slim
COPY --from=apache/beam_python3.11_sdk:2.61.0 /opt/apache/beam /opt/apache/beam
COPY --from=gcr.io/dataflow-templates-base/python311-template-launcher-base:20241127-rc00 /opt/google/dataflow/python_template_launcher /opt/google/dataflow/python_template_launcher

COPY repo repo

RUN pip install (with lots of flags) -r repo/requirements.txt
RUN pip install -e repo

ENV FLEX_TEMPLATE_PYTHON_PY_FILE="/repo/pipeline.py"
ENTRYPOINT ["/opt/apache/beam/boot"]`

and then run that all as a "Flex template" on Dataflow.

I was having a hard time figuring out how I would adapt this to a PEX as we never invoke python itself in our Dockerfile.

本文标签: How to run a PEX on Apache BeamGCP DataflowStack Overflow