admin管理员组文章数量:1277899
I'm tring to build dockerfile to use airflow and spark as follows
FROM apache/airflow:2.7.0-python3.9
ENV AIRFLOW_HOME=/opt/airflow
USER root
# Update the package list, install required packages, and clean up
RUN apt-get update && \
apt-get install -y gcc python3-dev openjdk-11-jdk wget && \
apt-get clean
# Set the JAVA_HOME environment variable
ENV JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
COPY requirements.txt .
USER airflow
RUN pip install -U pip
RUN pip install --no-cache-dir -r requirements.txt
My requirements.txt is
apache-airflow
apache-airflow-providers-apache-spark
apache-airflow-providers-celery>=3.3.0
apache-airflow-providers-google
pandas
psycopg2-binary
pytest
pyspark
requests
sqlalchemy
And it would take extremely long time to build and I kept getting info as below
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime.
=> => # Downloading google_cloud_workflows-1.16.0-py2.py3-none-any.whl.metadata (5.2 kB)
And if I remove python3.9 in the first line of my dockerfile, then I'm unable to install openjdk-11-jdk.
Does anyone know how to solve it, thank you
I'm tring to build dockerfile to use airflow and spark as follows
FROM apache/airflow:2.7.0-python3.9
ENV AIRFLOW_HOME=/opt/airflow
USER root
# Update the package list, install required packages, and clean up
RUN apt-get update && \
apt-get install -y gcc python3-dev openjdk-11-jdk wget && \
apt-get clean
# Set the JAVA_HOME environment variable
ENV JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
COPY requirements.txt .
USER airflow
RUN pip install -U pip
RUN pip install --no-cache-dir -r requirements.txt
My requirements.txt is
apache-airflow
apache-airflow-providers-apache-spark
apache-airflow-providers-celery>=3.3.0
apache-airflow-providers-google
pandas
psycopg2-binary
pytest
pyspark
requests
sqlalchemy
And it would take extremely long time to build and I kept getting info as below
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime.
=> => # Downloading google_cloud_workflows-1.16.0-py2.py3-none-any.whl.metadata (5.2 kB)
And if I remove python3.9 in the first line of my dockerfile, then I'm unable to install openjdk-11-jdk.
Does anyone know how to solve it, thank you
Share Improve this question asked Feb 24 at 12:56 lililili 254 bronze badges1 Answer
Reset to default 0Try Using Airflow's official constraints file - The file. The constraints file contains pre-computed compatible dependencies for Airflow, which drastically reduces pip's need to calculate dependencies on its own.
FROM apache/airflow:2.7.0-python3.9
ENV AIRFLOW_HOME=/opt/airflow
USER root
# Update the package list, install required packages, and clean up
RUN apt-get update && \
apt-get install -y gcc python3-dev openjdk-11-jdk wget && \
apt-get clean
# Set the JAVA_HOME environment variable
ENV JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
COPY requirements.txt .
USER airflow
RUN pip install --upgrade pip
# Use pip's constraint mode to avoid backtracking
RUN pip install --no-cache-dir --use-pep517 --constraint=https://raw.githubusercontent/apache/airflow/constraints-2.7.0/constraints-3.9.txt -r requirements.txt
The requiremnets
apache-airflow
apache-airflow-providers-apache-spark
apache-airflow-providers-celery>=3.3.0
apache-airflow-providers-google==10.1.0
pandas
psycopg2-binary
pytest
pyspark
requests
sqlalchemy
本文标签:
版权声明:本文标题:python - Create dockerfile to use airflow and spark, pip backtracking runtime issue comes out - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741269180a2368963.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论